EQUITY FACTORS

Guillaume SIMON 2 Contents

1 Introduction 1 1.1 From passive to active management ...... 1 1.1.1 A bit of history...... 1 1.1.2 Deviating from the benchmark? ...... 2 1.2 Active management and the financial industry ...... 4 1.2.1 Generating alpha is a hard task ...... 4 1.2.2 A first (intuitive) definition of factors ...... 5 1.2.3 Active management? Not everywhere! ...... 6 1.2.4 From Factors to Smart Beta ...... 7 1.2.5 Finally, is Alpha Dead? ...... 9 1.2.6 Structure of this course ...... 10

2 A Statistical Toolkit 11 2.1 Modelling equity returns ...... 11 2.1.1 Discrete-time modelling ...... 11 2.1.2 Continuous-time modelling ...... 12 2.2 A first glance at equity returns’ moments ...... 13 2.3 Usual statistical assumptions on returns’ distribution ...... 13 2.4 Aggregating returns ...... 15 2.4.1 Aggregation in the asset-dimension on one period ...... 15 2.4.2 Aggregation in the time-dimension on a single asset ...... 15 2.4.3 General case ...... 15 2.5 Moment Estimation ...... 16 2.5.1 Sample counterparts ...... 16 2.5.2 Estimation under the Gaussian assumption ...... 16 2.5.3 Volatility estimators ...... 17 2.5.4 Skewness and kurtosis ...... 18

3 Equity Factors: General Presentation 23 3.1 The Capital Asset Pricing Model ...... 23 3.1.1 Lessons from the CAPM ...... 23 3.1.2 Some elementary CAPM theory ...... 25 3.1.3 Empirical illustration ...... 26 3.2 Factor Theory ...... 28 3.2.1 Extending the CAPM equation: the three types of factor models ...... 28 3.2.2 Macroeconomic Factors ...... 31 3.2.3 Statistical Factors ...... 32 3.2.4 Fundamental Factors with observable betas ...... 34 3.2.5 Fundamental Factors with estimated factors: Fama and French ...... 35

i 3.3 Open questions ...... 39 3.3.1 Factors, the minimal requirements ...... 39 3.3.2 So, what is the message? ...... 41 3.4 Factors and the financial industry ...... 41 3.4.1 ARP or Smart Beta? Long Only or Long-Short ? ...... 42 3.4.2 How does the financial industry use those ideas? ...... 44 3.4.3 The importance of portfolio construction ...... 44 3.4.4 The performance puzzle ...... 45 3.4.5 Factors or anomalies? ...... 45

4 Intermezzo : Backtesting 47 4.1 Backtesting ...... 48 4.1.1 Providing accurate backtests ...... 48 4.1.2 In/Out-Of-Sample ...... 48 4.1.3 Biases ...... 49 4.2 Performance statistics ...... 51 4.2.1 Sharpe ratio ...... 51 4.3 Statistical significance of performance ...... 56 4.3.1 Sharpe ratio annualization ...... 56 4.3.2 Testing significance with the Sharpe ratio ...... 57

5 Main Equity Factors 61 5.1 Presentation of main factors ...... 64 5.1.1 Value ...... 64 5.1.2 Momentum ...... 67 5.1.3 Size ...... 69 5.1.4 Low Vol ...... 73 5.1.5 Quality ...... 76 5.2 Correlation and statistical properties ...... 78 5.2.1 Foreword ...... 78

6 The Dark Side of Equity Factors 85 6.1 So many factors...... 85 6.1.1 Number of factors and p-hacking ...... 85 6.1.2 Alpha decay ...... 87 6.2 Disappointing performance ...... 88 6.3 Implementation Costs ...... 89 6.3.1 Estimating costs ...... 90 6.3.2 The optimistic figures ...... 91 6.3.3 The pessimistic figures ...... 92 6.4 Crowding ...... 93 6.4.1 General intuition ...... 93 6.4.2 Some crowding measures ...... 94 6.4.3 The particular case of the value spread ...... 96

7 Stylized Facts on Equities 99 7.1 What are stylized facts? ...... 99 7.2 Stylized facts on stock returns ...... 100 7.2.1 Returns ...... 100 7.2.2 Volatility ...... 101

ii 7.2.3 Skewness ...... 104

8 Markets’ Heuristics 107 8.1 Objectives of this chapter ...... 107 8.1.1 Motivations ...... 107 8.1.2 Computations and numerical examples ...... 108 8.2 Risk ...... 109 8.2.1 Volatility ...... 109 8.2.2 Betas ...... 110 8.3 Correlations ...... 111 8.4 Stock size ...... 114 8.4.1 Market capitalization ...... 114 8.4.2 Turnover ...... 115 8.5 Biases and links between variables ...... 116 8.6 Heuristics for stock characteristics ...... 117 8.6.1 Dividends ...... 117 8.6.2 Earnings and multiples ...... 119

9 A Focus on 123 9.1 One Word On The Stock Covariance Matrix ...... 123 9.1.1 Mean- allocation ...... 123 9.1.2 Weights instability ...... 123 9.1.3 Should we work with covariance or correlation? ...... 124 9.1.4 PCA and SVD ...... 125 9.2 Eigen decomposition and financial interpretation ...... 126 9.2.1 Notations ...... 126 9.2.2 Eigenvalues ...... 126 9.2.3 Eigenvectors ...... 128 9.3 Heuristics ...... 129

References 153

iii iv Notations and Acronyms

Notations

a, b, ... lowercase letters denote scalars a, b, ... lowercase bold letters denote column-vectors A, B, ... uppercase bold letters denote matrices Aœ or AT is the transpose of matrix A A1 is the inverse of matrix A trˆA is the trace of matrix A detˆA is the determinant of matrix A signˆx is the vector of signs of components of x SφS is the modulus of a vector φ associated to an AR process IN is for the N  N identity matrix IN is for a N  N matrix of ones eN is for a N-dimensional vector of ones 1 is for a vector of ones, without precision on the dimension t letter t is used to represent the time T is used to represent the total number of time observations N is used to represent the number of assets pi,t denotes the price of asset i at time t diagˆx1, ..., xN  denotes the N  N diagonal matrix of statistical objects ˆx1, . . . , xN  on the diagonal, 0 elsewhere Ri,t denotes the random variable of the return of asset i at time t RM,t denotes the random variable of the return of the market at time t Rt denotes the N-dimensional random vector of returns at time t ri,t denotes the observed return of asset i at time t P Rt denotes the random variable of return of a portfolio p at time t P rt denotes the observed return of a portfolio p at time t rM,t denotes the observed return of the market M at time t rf,t denotes the return of the risk-free asset at time t rt is a N-dimensional vector of returns at time t rk is the T -dimensional vector of returns for asset k ˜ N  R rk k 1 is a T N matrix of returns for the N observed assets σij denotes the covariance of two assets i and j σii denotes the variance of asset i º σi denotes the volatility of asset i, equal to σii j δij or δi denotes the Kronecker delta, equal to 1 iff i j, 0 otherwise ˜ K F fk k 1 represents the matrix of the K factor returns included in factor models th fk represents the k column of F th ft denotes the t row of F and is a K-dimensional vector of the factor values at t

v 1ˆ. denotes the indicator function xˆ denotes the estimated value of object x ¢ means distributed according to á means statistically independent Eˆ. stands for the statistical expectation Vˆ. stands for the variance Covˆ. stands for the covariance Vasˆ. stands for the asymptotic variance Covasˆ. stands for the asymptotic covariance corrˆ., . stands for the correlation between two variables N ˆm, σ2 stands for the Gaussian distribution with mean m and variance σ2 N ˆµ, ٍ stands for the multivariate Gaussian distribution with a mean vector µ and covariance Ω RˆT  stands for the range of an operator T Domˆf stands for the domain (or support) of a function f R represents real numbers RN represents a N-dimensional space based on R R represents real positive numbers, including zero R represents real negative numbers, including zero R‡ represents real, non-zero numbers Mk,n represents the set of k  n real-valued matrices M  n,n represents the set of n n real-valued, symmetric, positive, semi-definite matrices

Acronyms

AIC stands for Akaike Information Criterion APT stands for Arbitrage Pricing Theory CAPM stands for Capital Asset Pricing Model FLS stands for Flexible Least Squares ICA stands for Independent Component Analysis iid stands for “independent and identically distributed” LFM stands for Linear Factor Model LO stands for Long Only LS stands for Long Short MaxDD stands for Maximum Drawdown ML stands for Maximum Likelihood MLE stands for Maximum Likelihood Estimator MPT stands for Modern Portfolio Theory MS stands for Markov Switching MSE stands for Mean Square Error OLS stands for Ordinary Least Squares PCA stands for Principal Component Analysis P&L stands for the Profits and Losses of a portfolio RMT stands for Random Matrix Theory SVD stands for Singular Value Decomposition

vi Chapter 1

Introduction

The focus of the present course is on equity factors. Factors, are in a nutshell, identifiable sources of risk that drive financial assets’ returns. They are both risk drivers, and potential sources of performance. Their discovery and the academic research underlying their identification came along the history of modern finance through the XXst century. The increasing interest for factors and the burst of investable products in the last years are the main motivation for this course. This short introductory chapter will is to detail the history of modern finance, to understand how and why the interest for factors appeared an how a whole industry has been built on this concept.

1.1 From passive to active management

1.1.1 A bit of history... In the 50s, Harry Markowitz laid the foundations of Modern Portfolio Theory (Markowitz (1952), Markowitz (1959)) : his work helped to understand the concept of diversification, that helps to increase risk-adjusted returns. Markowitz also settled the use of variance to model risk and gave birth to the con- cept of efficient frontier, along which the investors usually compare the optimality of their investments. This work is the first pillar of modern finance. This research, done in the 50s, has been completed ap- proximately 10 years later by the CAPM. Beyond the mean-variance optimization framework stated by Markowitz, the Capital Asset Pricing Model (henceforth CAPM) states that when markets are at equi- librium, any efficient portfolio should be a combination of a risk-free asset with an optimal-tangency portfolio whose weights are only determined by market capitalization of the stocks.

The messages of the CAPM must not been assimilated to the one of mean-variance allocation. How- ever, they also show an equal importance on the way we conceive modern finance. CAPM has been independently developed by Treynor (1961),Treynor (2007), Sharpe (1964), Lintner (1965) and Mossin (1966) (see also e.g. Korajczyk (1999) or the recent review of Perold (2004)). According to the CAPM, the optimal portfolio is made of all available assets whose weights are all relatively proportional to their market capitalization. This portfolio helps the theory of efficient markets to be self-fulfilling, since tracking this portfolio will naturally reinforce the relative market-capitalization structure at equilibrium.

However, if investors seek to replicate this target, their portfolios are, according to this theory, compul- sory sub-optimal with respect to this ideal benchmark once costs are taken into account. Any investor deviating from this target should end within the efficient set, due to fees and transaction costs. Conse- quently, it is both easier and more optimal for investors to buy ready-made market-capitalization indices.

1 However, the notion of index was not new in the middle of the 20th century when the Modern Portfolio Theory has arisen. Lo (2016) provides an interesting review of indices history from which some of the events described here are taken from. The DJIA (Dow Jones Industrial Average) equity index was born in 1896, whereas the first index, the DJTA (Dow Jones Transportation Average, still defined today) has been created in 1884. Standard & Poor’s elaborated its first index around 1923, and built the current S&P500 equity index as we know it now in 1957. market, of one sector or of one geographical area.

Dow indices began to show a non-trivial weighting in 1928 where some price-weighting has been in- troduced, whereas S&P indices were already market-cap weighted, following Irving Fisher’s intuition. Things really changed in the 1970s. Wells Fargo launched in 1971 for the Samsonite company, the first indexed account: the three people behind this product were John McQuown, James Vertin and William Fouse. Wells Fargo followed an indexed investing activity in the following years while in 1975, Van- guard and John Bogle launched the Vanguard First Index Trust, one of the first index mutual fund. What really changes is that an index becomes investable: buying one single product allows you to track the desired index. In the stream of the disaster of October 1987, the first ETF (Exchange Traded Fund), the SPY, has been created in January 1993 to replicate the S&P500 index. SPY is the security that is now the most-traded in the world, ETFs representing a worldwide asset of $3 trillions (see Balchunas (2016)).

1.1.2 Deviating from the benchmark? The CAPM bridged a gap at a time where empirical evidence between risk and return were not heavily investigated and helped to formalize the theory of decision in presence of uncertainty. Naturally, not all market participants follow strictly this approach, and this creates distortions that prevent markets from being at equilibrium, which generates trading. Investing in those indices has naturally gained the name of “passive” investing because of this mechanism and the self-realizing way of computing weights. This explains the long-standing success of market capitalization indices since they offer the most straightfor- ward and direct access for investors seeking exposure to equity markets. Added to the fact that they have strong theoretical roots in academic literature related to efficient markets!

ABC Vocabulary

Up to the beginning of the 2010s, passive investing was really understood as investing along benchmarks, on the basis of investment through market-capitalisation indices. Other indexing strategies may now integrate the category of passive investing. Key elements for an index definition are, e.g., defined and highlighted in Gander et al. (2012): a great capacity and a sufficient liquidity; a possible replication in a systematic and objective way; finally, the index that is built must have a representativeness.

Lo (2016) proposes a definition that is quite similar, that is to say “a portfolio strategy that satisfies three properties”: a strategy that is transparent, investable, and systematic in the sense that it is completely rule-based and without any discretionary intervention. Lo asks, as Merton, the fundamental question of “what function does an index serve” and identifies two different aspects of modern indices: informational purpose to wrap economic insights; and benchmarking purpose to serve as a reference for active managers. Rather than separating active and passive indices through a proximity with market-cap weighting, Lo (2016) makes a distinction in the sense that traditional indices are called “static” and sophisticated ones “dynamic”.

Market-cap indices are obviously not the unique option for any investor. First, their optimality is ques- tioned. Empirically, market-cap indices are in fact not optimal according to real data (see e.g. Thompson

2 et al. (2006), Clare et al. (2013b)). Second, there are also theoretical arguments that dismiss passive in- dices as optimal investments. Haugen and Baker (1991) have been some of the first contributors to consider market-capitalization weighted indices as “inefficient investments”. The authors underline that as soon as the usual fundamental assumptions behind the theory do not hold, the capitalization-weighted portfolios are no longer efficient. Third, investors may be constrained in practice: this invalidates the full application of the theoretical CAPM setting. Examples of such constraints are those related to short- selling. And in addition to those constraints, independently of their own appreciation of future market evolution, investors differ on various aspects: regulatory constraints, geographical biases, exposition to taxes, operational constraints, access to data, etc. An uncompensated risk therefore remains, meaning that for the given level of return of market-cap indices, there are still portfolios with lower risk. This being not a case against market efficiency in itself (Haugen and Baker (1991)).

But is everyone able to beat this market-capitalization benchmark? This is a more subtle question. Not every investors are qualified or able to do it (see again e.g. Perold (2007)). The fact that managers effectively add value is still a debate. Therefore, any investor willingly and dynamically managing its portfolio with an aim to produce a better performance than the passive indices has naturally been called an active manager. Managing actively is a notion that is relative in time and style, as we will see below. The first implicit assumption is that an active manager is believed to add value by deploying skills to de- viate from passive benchmarks represented by market-cap indices. A major contribution to understand the debate between passive and active management is the paper of Carhart (1997) that dismisses the general existence of added skills in mutual performance, in contradiction to the first results of Hendricks et al. (1993).

Deviating from a benchmark has to be understood under a general meaning: all fund managers do not compare, of course, to a benchmark. It means having actions that makes the equity portfolio different from the market-capitalization weighted portfolio. In a former work, the paper of Brinson et al. (1986) identified that market timing and security picking should account for less than 10% of institutional portfolio return. But managers seeking for total returns, managing absolute return processes, qualify ob- viously for the definition of active managers and the notion is, in a way, related to the message conveyed by Grinold and Kahn (2000) who consider that active management is intrinsically linked to forecasting.

ABC Vocabulary

The term alpha is used as a proxy for extra-performance with respect to “a” benchmark. The true econometric definition of alpha, as originally introduced by Jensen (1968), is linked by essence to the beta i.e. the sensitivity to the benchmark, the two quantities having to be determined jointly. This comes from the simple econometric equation:

Ri,t  Rf,t α  β ‰RM,t  Rf,tŽ  t

where Rf,t is the risk-free rate, RM,t is the time series of returns of the benchmark, and Ri,t is the time series of returns of a financial asset. However, in this introduction, we rather speak about a general and not-so-precise view of alpha, which has to be understood as a lazy proxy of the extra-performance generated when a loosely comparison to benchmarks is made. Including performance of competitors. Under this very crude definition, we may also understand alpha as being the attractiveness that a fund manager may have in the eyes of potential investors.

3 1.2 Active management and the financial industry

It is widely documented that a simple idea, emanating from Don Phillips, for Morningstar at the begin- ning to the 1990s, changed a lot of things in the approach: style boxes. This approach has been initially set-up to understand the differences between equity managers and identify their investing style. We may see them as being the pragmatic counterpart of the work of Fama and French that we will describe in Section 3.2.5 . What were those style boxes? Those were an attempt of investment style classifica- tion, sorting the holdings of the stocks of various managers depending on whether the companies were classified as value (see later Section 5.1.1) or growth, forming three buckets (growth being in a way anti-value), and small or big market capitalization. Crossing each of the three buckets of each charac- teristic to form 9 boxes (from small cap growth to high cap value and conversely). However, for each style, no benchmark was available at that time. This has been solved by Russell that created indices relative to those Morningstar’s Style Boxes, relayed by investable products through ETFs some years after. This kind of analysis has been trendy for years, helping investors to diversify their assets, with- out any dogmatic view on what was at the time, considered as passive or active investment. It became or it is seen as passive now, because investors are now used to see this kind of analysis. The idea of creating style ETFs, so investable products, backed by theoretical indices described by a style of man- agement has been a discrete, yet important shift in the industry. It created an easy access to potential out-performance (with respect to traditional market-capitalisation indices) at a cheap cost, and with a lot of simplicity: clear, transparent and rule-based, accessible to any investor, independently from its sophistication. This simplicity made that those products, backed by style or rule-based investment, are the first bridge between traditional market-cap indices, and pure alpha.

1.2.1 Generating alpha is a hard task

Let’s start with the simple examination of whether managers do or do not gain money. Hedge funds (a.k.a. speculative funds, in simple terms...) show in general a high degree of attrition, even in periods where the market is up (see e.g. Fung and Hsieh (1997), Brown et al. (1999), Amin and Kat (2003)). This is a simple observation: all things equal, after three years of activity, half of the funds may have shut down. This simple consideration shows that generating performance is a tough job! For surviving funds, the evidence that mutual funds managers are able to generate significant superior performance is, if not unclear, at least not so frequent or not so persistent (see Fama and French (2010), Wimmer et al. (2013), Johnson et al. (2015)). In a recent literature, Kosowski et al. (2006), Jiang et al. (2007) or Cremers and Petajisto (2009) among others, give insights to show that in some situations, managers may have some added value in their investments. Kosowski et al. (2006) show that funds or managers heavily diverge in their attitude towards risk, and in their idiosyncratic alpha: this may be explained by the diversity of approaches, cultures, constraints and mandates. Only a fraction of managers end-up with a profitable stock-selection after costs, but hopefully this superior alpha seems to be persistent. Cremers and Petajisto (2009) find that funds that allow to deviate in a large proportion from the benchmark tend to outperform the benchmark after costs, in contrary to less actively managed funds (i.e. with the smallest deviation relatively to the benchmark). This is consistent with the findings of Kacperczyk et al. (2005) whose results are in favour of active management delivering performance in the stream of effective bets on industry. Second, if there is any extra-performance, is this performance genuine and original? Bender et al. (2014) identified that a proportion of nearly 80% of active excess returns are in general obtained through exposure of portfolios to factors... Factors? Factors! This seems to be the topic of the present course. Factors, that we have still not defined yet!

4 1.2.2 A first (intuitive) definition of factors We presented it in a very simple way but style boxes was a first attempt to highlight the systematic drivers of performance and risk in practice. But the underlying messages underlying the work of Fama and French (see Section 3.2.5) has finally percolated among investors’ minds and investors started to identify more generic sources of risk. In general, asset managers pay more attention to absolute return realization than on risk and its origins. Factors, as style portfolios, were then a clear way to spot those drivers of risk. The factor representation allows to identify bets of an active investor since any perfor- mance track may be decomposed on those style factors. It became a clear manner both to identify or classify active managers to understand to which category they were belonging. As identified by Bender et al. (2014), the fact that managers tilt their portfolio towards famous factors explains a large proportion of their returns. It was then possible to explain (at least a part of) the extra-performance generated by ac- tive managers in a systematic manner. And if active management is not able to generate in all occasions a consistent and robust performance, it became at least possible for clients to understand which kind of risk they were facing. It was then possible for them to explicitly target, choose and focus on specific investments, depending on the risk sources they wanted to be exposed to.

We are concerned in this chapter by finding an intuitive and qualitative definition of factors. The aim of this course is to develop the notion of factors on equities, but the notion of factor is in fact universal and present on various asset classes1. Interesting enough, they emerged from this precise asset class, equities. Aggregating the returns on different assets into a portfolio gives rise to the influence of factors. In their simplest expression, factors are simply portfolios of securities. In a nutshell, factors are the sources of common exposures among asset classes. Factors are built by grouping those securities along some characteristics that they may share. Whether those characteristics are statistical, fundamental, de- scriptive, etc. Depending on the type of factors, those characteristics may be either explicit or statistical, estimated or known in advance, objective or subjective. Factors may be built from cross-sectional prop- erties, or can be obtained from an economic approach involving inflation, growth or interest rates. They may span countries, sectors and the aforementioned asset classes.

Factors should be elementary, non-idiosyncratic sources of returns that shape asset returns and their characteristics in terms of risk. Factors should explain the intra-asset class cross-correlation. Factors are indeed widely used to understand the similarities in risk and return properties between securities. An interesting fact is the examination of the “forensic” statistics of the two last financial crises. During the dot-com crisis in 2000-2003, it is documented that approximately half of the stocks posted a negative performance, representing nearly 75% of the total market capitalisation of the SP500 index. During this period, small and value stocks were up, whereas tech stocks were hammered. During the financial crisis of 2008-2009, 95% of the stocks representing more or less the same amount of the market capitalisation of the same index, experiences negative performance. This time, small and value stocks were essentially down. This illustrates two things: first, a market crash does not mean that every stock will be hit; sec- ond, factors (here Size and Value) may have very different behaviour in some market phases, including downturns. The drivers of this intermittent performance or drawdowns are various and subtle to iden- tify. This difficulty to time factors is mainly due to the actual interconnectedness of markets, assets and players: funds, banks, asset manager, etc. The liquidity of equity markets makes that it allows for many investors to have an easy access to a liquidity pocket, readily available. If this need for liquidity has exogenous reasons, it may even have consequences for equity markets in the end.

Most of the time (but not always), factors are unobserved, given their latent nature and require a model

1An asset class may be thought as an asset category, such as equities, bonds, real estate, commodities etc. based on their intrinsic characteristics and their very nature. In an ideal world, asset classes are independent and should represent the whole possible investment universe.

5 to estimate them or an ex ante formulation of their composition. Factors are generally understood within the same asset class. But even if those factors have been identified originally in the equity sphere, they may have a good generalization property, as explained e.g. in Asness et al. (2013), where the authors find that those factors are indeed quite universal across asset classes (they identify a common link through the channel of liquidity risk that explain their correlation across markets and across asset classes).

Take-Home Message

Factors have become a deep trend in the asset management industry and powerful investing tools. It is not possible to disentangle the theoretical definition of factors from the trends and habits of the financial industry. We may understand at this point factors as being a very crude proxies of transparent, publicly known and long-standing strategies, but also as risk engines explaining the cross-section of equity returns. This picture will be refined further, but let keep it this way for the moment. Factors are both a research tool and an easy access to rule-based investing. This is important to understand this duality.

1.2.3 Active management? Not everywhere! The contradictions of the academic literature appear also in the contrasted results on factor timing (tim- ing a factor meaning being able to forecast which factor is going to perform or not in a near future). We (of course) will come back on this topic later, yet an early work of Daniel et al. (1997) finds that mutual funds managers may add some alpha with a pertinent stock picking rather than with an effective factor timing, whereas some years later, Jiang et al. (2007) find some evidence that mutual funds do gain performance due to their timing ability. In fact, using a superior framework for information processing, they may appear more concentrated and playing some industrial bets. This is quite in-line with the ar- guments of Wimmer et al. (2013), who find that performing, actively managed funds, may experiment alternated periods of poor performance with no effective alpha, after several consecutive years of suc- cessful investments. Put in an other way, performance may come from an ability to time or take benefit from an identified economic cycle or regime.

Investing in factors and effectively timing them at a high frequency is of course a form of active man- agement. But not every fund manager shifts towards an active management. In 2016, the lack of active management has been identified by the suite of proprietary SPIVA Scorecard reports delivered along years by S&P Dow Jones. The proportion of active funds reveals to be very tiny, less than 5% of funds in some countries. For instance, Ung et al. (2015) identifies an inverse link between the holding period of the fund and the ability of the best funds to remain performing ones. A vast majority of funds (in- cluding in Europe or US) can not over-perform their benchmarks over the mid or long periods (typically 3, 5, or 10 years). In the 80s, most of mutual funds were genuinely active: with more and more avail- able data, increased automated trading, more anomalies discovered, it may seem paradoxical to observe that the proportion of indexed fund with passive management has in fact increased since then. But it is quite natural to think that with a low active management, there is little space for pure alpha generation. However as Cremers and Petajisto (2009) notes it, the active bets of the mutual funds as a whole remains significant when aggregated.

Recently, Clare et al. (2013b) find that in the latest years, a rather large number of alternative index- ing methods would have given more profitable risk-adjusted performance than the traditional market- capitalization index. And even that many random choices could have led to more satisfying a posteriori results than the market-cap index! Their point is that since the end of the 1990s and the beginning of the

6 Figure 1.1: Historic evolution of the semantic vision of performance decomposition through alpha and beta. This graph also appears in Ielpo et al. (2017). Alternative Risk Premia will be defined in Section 3.4.1.

2000s, the market-cap indices provide disappointing results in absolute. See also Chow et al. (2011).

1.2.4 From Factors to Smart Beta After years of academic work, as we will detail it in the upcoming chapters, this research led to the ap- pearance of investable products. The academic research is old: after a quiet period from 1934 to 1981, a bubbling in the 80s and a wave in the 90s. But if the first investable products may be the Russell’s ETFs backed by style boxes, detailed supra, a huge trend started after the work of Ang et al. (2009a). If it’s not a theoretical breakthrough, this common work of Ang, Goetzmann and Schaefer for the Nor- wegian sovereign fund (one of the biggest of the world, 650 billions US$ under management) had a tremendous impact on financial industry. The report was a case for the use of risk-factor investing and the use passive-like investment strategies to enhance traditional portfolios. As stated by Ang (2014), “factor based investment is a new paradigm for long term investment”. Some years later Bender et al. (2014) identified that the celebrated Fama-French factors explained approximately half of the average extra-performance of US institutional fund returns.

Let’s have a look at Figure 1.1. In a way, this graph is not ours. Our sources are therefore numerous (see for instance Bender et al. (2014) among others for a close inspiration). This graph may be found under very similar alternatives in numerous papers, theoretical or professional presentations, with an interest- ing variation on words, terms and labels. What interests us for sake of clarity is this shared consensus on the evolution of the way people are talking about factors or, as it may be an alternative name, risk premia. The main message of this evolution is that a large portion of what was traditionally considered as alpha in the past, has now turned into a mainstream alpha that is now considered as being beta. Alpha or performance generation is therefore a matter of times and risk analysis.

Following the CAPM inspiration of this decomposition, the beta or market return, was the part of perfor- mance that may be explained by market moves through the sensibility/exposition to those moves. Beta was merely conceived as the part of the returns coming from passive exposure to market-capitalization

7 indices. Alpha, the residual part, was extra performance due to active management. Formerly, alpha was mainly asset selection with good performance believed to be only due to manager’s skill and abilities. This alpha plus beta decomposition induced a quest for alpha for managers, as expressed for instance in Montier (2007). Market returns may be described through their explicit expositions to regional, sec- torial and economic risks (exposition to inflation, to some geographical zones, etc.) All of this is more descriptive, and is not an explanation per se.

Therefore, alpha or extra-performance created by skilful managers has always been a relative concept across time. The more it is possible to identify the sources of risk (or bets) a manager is playing, the less freedom it leaves him to appear as a performing one, standing out of the crowd. But with the democra- tization of style indices, and their diffusion as financial products, they appeared quickly themselves as common or plain vanilla. It however leaved some space for talented managers to generate performance through an even more active management.

The performance that is created is always function of what is easy to identify or to generate “at a cheap cost” (that is to say, with simple, audited data, using vanilla procedures and techniques). Alternative risk premia includes therefore alternative indexation (fundamental indexation, see e.g. Clare et al. (2013a)) but also all the more or less sophisticated but systematically, reproducible, and identified strategies. This builds a space for a whole stream of strategies and potential indices whose risk is compensated for by expected returns. Let us underline that those alternative risk premia are not necessarily a hedge against market downturns and are commonly exposed to global economic and market fluctuations and, poten- tially, to unforeseen slow-downs at some periods. Products flagged as alternative risk premia obviously aim at improving returns and increasing diversification. But two things are really modified in the finan- cial landscape.

First, the notion of passivity itself, since passive investing is gradually shifting from a market-capitalization weighting notion to a broader extend encompassing factors in a rule-based, transparent way and with low implementation costs. Second, the traditional alpha component of portfolio returns are now understood as compensation for identified risk exposures and non-diversifiable risk. We even sometimes find them under the term “units of risk”.

Factor-related strategies are now available through the use of ETFs. They are becoming increasingly popular. Those products are now generally known under the term of Smart Beta products, indices and ETFs. Smart Beta is used here as a generic term since the terminology of funds remains rather vague: and products are at the heart of a heavy marketing and we will clarify the terms in Section 3.4.1. Let’s have a look at figures. The following numbers detailed infra have been collected in Bryan et al. (2019), Invesco (2016),Moghtader (2017), Shores (2015).

We saw supra that in 2000 we were just at the beginnings, with ETFs allowing to tilt towards Size and Value. The first true Smart Beta ETF is believed to have been introduced in 2003, while the first fundamentally-weighted (see Arnott et al. (2005) for a definition) ETF has been created in 2005. In 2011, the first ETF weighted along with volatility consideration got created (see on this topic Demey et al. (2010)). It has been estimated that in few years, in 2013, the total of Smart Beta ETFs was rep- resenting a global asset under management of around $ 350 billions, becoming 529 by end of 2014, with nearly 700 available ETFs. In 2016, 70% of institutional investors were using, or were planning to increase their investments in factor-related strategies. In 2017, it was estimated that the assets under management had doubled since 2010. Today, we estimate that if (really) passive investment represents roughly 15% of the total assets under management, Smart Beta represents around 1500 investable prod- ucts for $ 800 billions under management, roughly 10% of the total assets under management. An

8 Figure 1.2: US Smart Beta ETPs asset growth. Taken from Bryan et al. (2019).

estimated evolution of assets is available in Figure 1.2, taken from Bryan et al. (2019). Those figures show that there is a rising interest for those products with an incredible speed of increase of the assets.

1.2.5 Finally, is Alpha Dead?

Before delving into factor theory within this modest course, we should ask a question: is alpha dead? Are factors sufficient to navigate the financial landscape nowadays? Paradoxically, we believe that all this story is in favour of the use of proprietary strategies since they are many ways to access to perfor- mance and that things are not static, a dynamic management may be the key to future profits. The role of an asset manager today is still to generate performance, with a smart security selection, coherent bets, a precise market timing and of course, a robust and repeatable investment process. The beta exposures to all kind of strategies and risks have to be carefully monitored. This means also that the fees that are paid by investors have to be justified and associated to a real, persistent and convincing alpha creation.

Wermers et al. (2012) have shown that an analysis of individual securities could generate profits that can not be obtained only by industrial, sectorial or factorial exposition. However, investors have to be aware that the use of new styles or new strategies comes quite always at the cost of capacity (approxi- mated, say, by the maximal amount of dollar-risk you may take for a given acceptable level of dollar- gain). Capacity may be limited depending on the strategy that is used. Departing from the wide-capacity market-capitalization indices comes at cost. To underline the increasing pressure of competition on alpha generation, Lo (2016) identifies that “competition suggests that alpha should be capacity-constrained, hard to come by and expensive”. Generating new sources of alpha is more and more difficult, needs a cautious process, skills and this should have a price for the investor wanting to access superior levels of performance.

9 1.2.6 Structure of this course This course is essentially separated in two parts. After this introduction and a short excursion to elab- orate on the econometric tools needed to handle the estimators and the statistical objects that we will need along the way (Chapter 2), the first part is dedicated to the study of factors, with in particular their theoretical definition in Chapter 3. In this Chapter 3 we will try to understand how factors appeared in the stream of the CAPM, and what is the semantics used for them in the financial industry with the description of the three types of factor models. After this presentation, we will make a little interlude in Chapter 4 wihch deals with backtesting. This has, in principle, nothing to do with factors except that it gives the basics of simulation and gives insights on the right attitude to adopt as a researcher when trying to dig into a potential anomaly candidate. Afterwards, we will present some of the most famous equity factors with some modest numerical experiments (Chapter 5). To conclude this first part we will compile all the critics made on factors (data mining, implementation costs, etc.) and their intensive marketing (this being done in Chapter 6). The second part is dedicated to equity heuristics and stylized facts. Many of those equity factors are based on some constants that are really related to the life of companies and stocks (volatility, dividends, etc) so we list in Chapter 7 the most famous stylized facts on equity markets, and in Chapter 8 some important quantities to have in mind (mean level of dividends, earnings and multiples, etc). We conclude by a dissection of the structure of covariance matrices that are one aspect of statistical factors and a potential crucial ingredient for portfolio allocation (Chapter 9).

Let’s start!

Main Bibliography

• Fama (1998)

• Hirshleifer (2001)

• Schwert (2003)

• Malkiel (2003)

• Durlauf et al. (2008)

• Fama and French (2008)

• Ang (2014)

10 Chapter 2

A Statistical Toolkit

The purpose of this chapter is to provide the elementary, statistical tools to understand the basics of equity and factor modelling from a theoretical perspective and pave the way to the study of stylized facts as expressed in Chapter 7. Of course, it is not compulsory to read this chapter to understand factors but it is in our view a complementary reading to introduce the statistical objects at play when defining them.

One will find a stochastic and sophisticated mathematical modelling of the price process in Karatzas and Shreve (1998) for instance but the purpose of the chapter is much simpler. It is to understand how to compute returns of an observed time series and what are the consequences of the possible choices. We assume throughout the chapter that we follow a pool of (sufficiently) liquid stocks, with generic notation i for a given stock.

We define a random variable, the return Ri,t1 at date t  1 that quantifies the increments of the price process ˆPi,t of the stock. The return should depend on the value of the stock on the period but also on the potential payoffs linked to the holding of the stock on the given period. The definition of return is not unique, returns are always defined relatively to a time scale: fixing the length of the period deter- mines the time scale and the “dimension” of the computed returns. We assume that the process ˆPi,t is observed on a discrete-time grid ˆ..., t  1, t, t  1, t  2, ... where the time-step of the grid is fixed (day, week, month, etc.). The price of stock i at date t writes Pi,t, with the potential delivery of a dividend equal to Di,t on the period t  1; t¥.

2.1 Modelling equity returns

2.1.1 Discrete-time modelling

In discrete-time, the price process the random value Ri,t1 of return at date t  1 is defined as:

Pi,t1  Pi,t Di,t1 Ri,t1  . Pi,t Pi,t i The first term is the relative gain or loss in capital. The second is the dividend rate. For a stock, Di,t1 is positive, not stochastic but not known in advance, set by the company, positive or equal to zero and may lead to the payment of taxes. The discrete time modelling leads to returns that are often called arithmetic or simple returns.

Now, let us consider that we want to compute the same kind of return but on H consecutive periods. We will note this return:

11 ˆ  P   P D  H i,t H i,t  i,t H Ri,tH . Pi,t Pi,t

ˆ ˆH ˆ ˆH 1 ˆ  Both processes Ri,t t and Ri,t t Ri,t t have not the same dimension but are related through a non-linear relation since: k H ˆH Š M ˆ    Ri,tH 1 Ri,tk 1. k 1

We have here a first message, which is that even with the most natural way of defining returns, the simple return definition, is not linearly aggregated in the time-dimension since:

k H ˆH x Q Ri,tH Ri,tk. k 1

A first order development when the realizations of ˆRi,t are close to zero, and when H is low, makes that the error made in the generally weak in practice.

2.1.2 Continuous-time modelling

With a continuous-time modelling for the price process, at each moment u > t  1; t¥ the performance is assumed to be constant equal to ri,t. On an elementary period u; u  du¥, if dPi,u is an elementary increase in price in u, we have:

dPi,u ri,tdu . Pi,u

Integrating between t  1 and t, this gives:

Pi,t1 ri,t1 log Š . Pi,t

This defines geometric or log-returns. A first order development of the function x ( logˆ1  x gives that logˆ1  x ¢ x for x close to 0. ˆPi,t1  Pi,t~Pi,t has no dimension and is close to 0 therefore giving: P  P   P P   P log Š i,t 1  log Š1  i,t 1 i,t  ¨ i,t 1 i,t . Pi,t Pi,t Pi,t

We wee that ri,t1 pi,t1  pi,t if pi,t logˆPi,t. This definition of the return is exactly linear in the price increments as soon as we use log-prices rather than raw prices1. The key-property of geometric returns appears: they are linear in the time-dimension, contrary to simple returns. Indeed, if ˆri,tˆHt is the H-periods geometric return process, then:

k H ˆH ˆ  ˆH  Q ri,tH log 1 Ri,tH ri,tk. k 1

1It is usual to see in the literature implicit consensus on notations that use lowercase letters for log-returns or log-prices. We keep those notations for this paragraph only since everywhere else we prefer to use the widespread convention of uppercase letters for random variables and lowercase letters for observations.

12 2.2 A first glance at equity returns’ moments

Returns and their moments play a major role in equity modelling. Estimating them is then an impor- tant topic. Why are they so important? In practice, the Gaussian assumption for returns has been a widespread hypothesis for years. As Gaussian laws are totally known as soon as their mean and vari- ance are fixed, this explains why (not totally of course) estimating mean and variance of returns have been a prerequisite for studying equity returns. We will see later that this Gaussian modelling is of course out-dated.

We understand the moment of order m of a distribution here as being the probabilistic expectation of the random variable up to a power m. From a theoretical point of view, having all the moments of a given distribution should help to characterize it (even if, theoretically, it may not always be sufficient to determine it). From the practical side, estimating moments may be a guideline to understand the returns: it allows to know if the return distribution matches the usually observed stylized facts (see Chapter 7). It may also allow to test for the stationarity or the nature of the return distribution. In the financial world, return moments are extensively used in risk management, prevision and allocation.

With a finite sample, computing the sample mean of a time-serie even up to a given power, seems pretty easy. But if we want to estimate moments, we have at least to assume that they exist! Financial returns are in general non-Gaussian, this appearing in the fact that tails of returns’ distributions are heavier than the tails of a Gaussian law. And in the existence of an asymmetry in extreme returns, negative extreme returns being relatively more frequent than positive extreme returns. And the fatter the tails, the less likely is the possibility to compute higher moments. In some cases, distributions with extremely fat tails may not have an existing skewness or variance. The first moment of the return distribution, expected return, is understood as a proxy of performance. The mean expected return is linked to the drift of the price process. The second moment, variance, has historically been understood as a proxy of risk. Moreover, the historical use of mean-variance for portfolio allocation drove also the particular interest for mean and covariance as proxies of performance and risk.

There is a subtle distinction to make about the third and fourth moments. They are rarely estimated per se. Practitioners rather prefer to use skewness and kurtosis which are respectively the third and fourth moment, not of the raw distribution itself, but of the centered and normalized distribution. Skewness, that we will note s, measures the degree of asymmetry of the return distribution relatively to its mean. It is a signed and is not compulsory defined for any distribution. Its interpretation depends also on the nature of the distribution but for uni-modal distributions, a negative skewness is in general the sign that the left tail of the distribution has more weight than the right-tail. Kurtosis, expressed here through the notation k, measures the thickness of the tails of the return distribution. It describes also the shape of the distribution. A high kurtosis means for instance that volatility is driven by less frequent but extreme moves rather than mild but frequent moves. Both for skewness and kurtosis, estimation is sensible and the best estimators to choose are heavily debated, depending sometimes of the context.

2.3 Usual statistical assumptions on returns’ distribution

In this section we will briefly describe the consequences of common statistical assumptions on equity returns. So here we are not interested in finding the statistical law that fits the data best, but rather to see what is implied for portfolios by assumptions such as Gaussian returns. A crucial remark is that those assumptions are always made on returns. It is nearly impossible to postulate a stationarity as- sumption on prices. Statistical assumptions are always made on price increments or on returns. We will focus on a simple set of three assumptions with a decreasing level of universality. Those assump-

13 tions (Assumptions 2.1, 2.2, and 2.3) are fairly standard in financial econometrics. We look here at the distribution at date t of a N-assets return vector, that we note Rt. Depending on the strength of the as- sumption, those assumptions may help to: 1) propose estimators and models for analysis and prevision ; 2) get an asymptotic diagnosis on the return moment estimators (independence, asymptotic distribution, asymptotic moments; 3) get a diagnosis on estimators at finite distance; 4) help to recover the return distributions of returns when they are aggregated along the time or the asset dimension.

Assumption 2.1 ˆRt is said to be stationary at the second order (or of order 2) if the probability law of ˆRt,RtH  depends on H but not on t. In particular, the moments of the marginal law of ˆRt at date t do not depend on t. Expectation, and (co)variance of ˆRt are independent from time. So is also the autocovariance of ˆRt, CovˆRt,RtH  that is a function Γ of H only that may be non-trivial:

E Rt¥ µ V Rt¥ Ω CovˆRt,RtH  ÈH ˆ œ ˆ 2  with µ µ1, ..., µN and Ω σi,j i,j> 1;N¥2 .

Assumption 2.1, even if it appears to be rather weak, is likely to fail in general for financial returns, due at least to the existence of the volatility clustering effect (see Mandelbrot (1963)). Volatility clustering stipulates that volatility switches from low to high regimes, meaning that the second moment of the return distribution changes with time, which is in opposition with Assumption 2.1. Moreover, the ex- istence of different trends in any market, is another counter-example that the first moment of the return distribution is constant in time. Assumption 2.1 is the definition of stationarity of order 2 which is the most common and the weakest form of stationarity. This assumption remains an important statistical tool for estimation and analysis.

We will understand in the following by autocorrelation the function γˆH of order H where H is a natural integer as is defined as follows:

CovˆR ,R   γˆh t t H . (2.1) V Rt¥ For a given time serie, the autocorrelogram is the graph mapping the order H C 0 to the empirical value of γˆH > 1; 1¥ for the time series, for a reasonable support of H.

Assumption 2.2 In addition to Assumption 2.1, returns ˆRtt are assumed to be independent through time. A consequence is that ÈH is in particular zero-valued as soon as H x 0.

Assumption 2.2 is already very strong. It corresponds to the definition of strict stationarity: it means that the joint statistical distribution of any collection of the time series variates never depends on time. Then Rt follows a distribution that does not depend on time. This definition is very strict and difficult to test, most experiments would rely in practice on the weaker definition of stationarity defined with Assumption 2.1.

Assumption 2.3 In addition to Assumption 2.2, the law of returns ˆRt at date t is assumed to be Gaussian, following at each date t a Gaussian distribution N ˆµ, ٍ.

The assumption of Gaussian returns will not be necessarily stated in the following. In the case of a single asset i, if we formulate a Gaussian assumption on Ri,t, we see that having observed pi,t1, the

14 price Pi,t of asset i follows in this case a Gaussian distribution with mean Pi,t1ˆ1  µi: the support of the resulting distribution may have a negative support which means that random future prices, under this assumption, may be negative! This holds not only in the case of the Gaussian distribution, and results from the choice of arithmetic returns. It shows however that if such an assumption has to be made on the distribution of returns, the geometric definition of returns is more adapted to recover prices that are then always positive.

2.4 Aggregating returns

What are the aggregation properties of returns? We briefly explore here the way they sum-up both in the time and the asset dimension. Let’s assume that we work with a portfolio of N assets, with N weights w > R with components wi not necessarily positive that may be either quantities or relative proportions. We note respectively Pi,t, Ri,t and ri,t the price, the one-period arithmetic return and the one-period geometric return of assets i 1, ..., N at date t.

2.4.1 Aggregation in the asset-dimension on one period Things are simple when we work with arithmetic returns. In this case the one-period geometric return P ˆ  Rt1 w of the portfolio at date t is:

N P ˆ  Rt1 w Q wiRi,t1. i 1 Unfortunately, geometric returns do not have this property since in the most general case:

N N Pi,t1 Pi,t log Š Q wi  x Q wi log Š . i 1 Pi,t i 1 Pi,t1 Consequently, we have in general: N P ˆ  x ri,t w Q wiri,t, i 1 (even if this inequality may approximately hold in practice when returns have a low amplitude).

2.4.2 Aggregation in the time-dimension on a single asset With a single asset, geometric returns are linear with respect to time-aggregation which is not the case for simple returns. The linearity mainly comes from the choice of the return, not from an independence assumption. If we formulate a strong assumption of independence and Gaussianity as Assumption 2.3, working with w as quantities, on a H-periods horizon, with geometric returns, Gourieroux´ et al. (1997) computes explicitly the law of evolution of log-prices: we are in a framework that is similar to the one of Black and Scholes. The main message is that in this case we obtain the normality of log-prices with moments depending both linearly in H. In particular, under the no-rebalancing assumption (quantities fixed), the return by unit of time of asset i has a decreasing variance by unit of time.

2.4.3 General case We will retain here the general result that arithmetic returns are not linear in the time-dimension but sum linearly in the asset-dimension. It is the opposite for geometric returns that sum linearly in time, but not in the asset-dimension. The general case is much more sophisticated. See Chapter 1 of Gourieroux´ et al. (1997) that explores the statistical behaviour of portfolios (laws, moments, expressions) in various

15 situations, with a Gaussian assumption or not. It is in particular stated that the arithmetic definition of returns is more useful in general case since computations in the multi-stocks case become much more difficult. This explains why arithmetic returns are often chosen in practice for statistical inference.

2.5 Moment Estimation

A statistical challenge is to estimate the moments of the return distribution. We are interested here in the definition of estimators of the moments of the unconditional distribution of returns, independently from a dynamic view.

2.5.1 Sample counterparts It is always possible to compute weighted averages of a finite time serie to obtain a given quantity. We hope of course that this quantity will represent an acceptable estimator of the moment of the theoretical distribution. Sample counterparts are the empirical equivalent of theoretical moments. For a given stock i, one dimensional real-valued time series of returns ˆRi,t with T observations ˆri,1, ...ri,T , the sample ˆ1 counterpart mˆ i,T of the expected return (first moment) is given by:

T ˆ  1 1 Q mˆ i,T ri,t. T t 1 The sample counterpart of the second moment is given by:

T ˆ  1 ˆ  2 Qˆ  1 2 mˆ i,T ri,t mˆ i,T . T t 1

ˆ3 ˆ4 The sample counterpart for skewness, mˆ i,T and for kurtosis, mˆ i,T will therefore be simply obtained through: N ˆ  ˆ13 N ˆ  ˆ14 ˆ  1 Pt 1 ri,t mˆ ˆ  1 Pt 1 ri,t mˆ mˆ 3 i,T and mˆ 4 i,T . i,T 3~2 i,T 2 N ˆ ˆ2 N ˆ ˆ2 mˆ i,T mˆ i,T Those sample counterparts are obviously natural estimators for expected return, variance/volatility, skewness and kurtosis. They have for a fixed T a bias that is generally forgotten in practice since it vanishes quickly with T . The question is now to see if there are other estimators, if they coincide with the previous expressions, and if the theory can help us to infer on their statistical properties.

2.5.2 Estimation under the Gaussian assumption

We refer here to the discussion of Section 2.5.2 and assume that returns ˆRtt are N-dimensional real- valued vectors who follow an i.i.d. Gaussian N ˆµ, ٍ with µ > RN , Ω a N  N real-valued square matrix, and observations ˆr1, ..., rT . We are then under Assumption 2.3. The Gaussianity assumption serve not only a purely computational purpose, but will also help us to understand the statistical prop- erties of the estimators. Should this assumption be relaxed, estimation and asymptotics formulas would still be possible to obtain but potentially with less clarity.

It is possible to estimate the moments ˆµ, ٍ thanks to Maximum Likelihood Estimation (henceforth MLE). MLE is fairly standard in many statistical textbooks (see e.g. Kendall and Stuart (1977) or Florens et al. (2007)) especially for multivariate Gaussian law. The computation we re-do here is quite classical and is already presented in a financial context in Roncalli (2013). The likelihood L for a Gaussian

16 distribution, as a function of the parameters, conditionally on the observations and as as function of the parameters, writes:

T 1  1 ˆr µœΩ1ˆr µ Lˆµ, ٍ M » e 2 t t t 1 ˆ2πN detˆΩ

1  1 PT ˆr µœΩ1ˆr µ 2 t 1 t t T e . ˆˆ2πN detˆΩ 2

The log-likelihood l0 is:

T NT T 1 œ 1 l0ˆµ, ٍ  logˆ2π  logˆdetˆΩ  Qˆrt  µ Ω ˆrt  µ. (2.2) 2 2 2 t 1

The estimators ˆµˆT , Ωˆ T  are given by the parameters obtained as an expression of the set of observations that maximizes Equation (2.2). At the optimum, the two derivatives of the log-likelihood with respect to the parameters have to be equal to zero. For this, differentiating l0ˆµ, ٍ with respect to parameter µ and equalising to 0 gives : 1 T µˆT Q rti. T i 1

In the same way, plugging µ µˆT in l0ˆµˆT , ٍ gives a function of Ω only which is scalar-valued. We œ 1 can replace any term ˆrt  µ Ω ˆrt  µ by its trace, we omit the constant term (that will disappear when differentiating), and use the commutation properties of the trace function. We finally get:

T T 1 1 œ lˆΩ  logˆdetˆΩ  trˆΩ Qˆrt  µˆT  ˆrt  µˆT . 2 2 t 1 If we differentiate with respect to Ω and equalise the equation to zero we finally get the expression of the estimator for Ω: T 1 œ Ωˆ T Qˆrti  µˆT  ˆrti  µˆT . T i 1 Under a Gaussian assumption, maximum likelihood estimators and sample counterparts are similar in fact for the first (expected return) and the second moment (variance). For an extended discussion in the general case, see Ielpo et al. (2017).

2.5.3 Volatility estimators There are several ways to estimate volatility (the square root of variance). As detailed in Section 9.1.3, it is possible to estimate separately correlation and volatility to estimate covariance. We expose in this paragraph some useful contributions for alternative estimators of individual volatilities, different from sample estimators. We focus here on historical volatility estimators and not on more sophisticated views of volatility using, for instance, options data.

Working with daily data, the first option is to compute volatility assuming either that the drift (computed as the mean return on the sample) is positive or not. Another possibility is to compute volatility as an exponential moving average of the time serie of returns: this choice is quite frequent among practition- ers, and a review is available in Alexander (2008).

An other possibility is to use intra-day data. The advantage is that with more data and observations, and then to get more precise estimators. Parkinson (1980) used high and low prices rather than daily,

17 close-to-close prices. The estimator is the square-root of the sum of squared logarithms of ratio of high- est prices divided by lowest prices. Garman and Klass (1980) corrected the squared log ratios of high to low prices, with another log term of the ratio of close to open. Another proposition for an histori- cal volatility estimator using open, high, low, and close prices is available in Rogers and Satchell (1991).

The Garman-Klass and Rogers-Satchell estimators are now widespread in the literature but are quite inappropriate to handle volatility on price processes with jump components. Yang and Zhang (2000) provided a correction relatively to those estimators that may account both for non-zero drift and jumps. This estimator allows in particular to get an overnight/intra-day decomposition of the volatility. Inter- ested readers could also find further contributions with the papers of Ball and Torous (1984), Barndorff- Nielsen and Shephard (2002), Andersen et al. (2003), Zhang et al. (2005) or Andersen et al. (2012).

2.5.4 Skewness and kurtosis

The computation of empirical skewness and kurtosis on a finite sample is not a challenge per se. How- ever, some estimators may be critical in the case of distributions with fat tails (see Bouchaud and Potters (2009)). We may recover huge empirical figures meaning nothing. In general, the literature on the sub- ject is not very developed. One contribution is Kim and White (2004), that gathers robust versions of skewness and kurtosis estimators and that we will partly relay here. As we will not explore measures of co-skewness and co-kurtosis, we will consider for sake of clarity, a scalar, univariate return serie ˆRt with sample observations ˆrtt 1,...,N . On this sample, we assume that the unconditional distribution of returns R has an empirical mean µ, an empirical median medˆR, an empirical mode modeˆR, and a volatility σ. Of course, all those quantities are estimated but we skip the estimator notations for sake of simplicity. In the same way, quantiles and distributions have to be understood as empirical quantiles.

2.5.4.1 Skewness estimation

A Gaussian assumption for returns in order to derive a skewness or a kurtosis estimator would be point- less here. Since the values are known and fixed, there would be nothing to estimate. On the contrary, departure from those fixed values will produce tests for the normality of a distribution. As detailed before, the natural sample estimators for skewness and kurtosis are:

N 3 N 4 1 P ˆr  µ 1 P ˆr  µ sˆ t 1 t and kˆ t 1 t . N σ3 N σ4

A correction for the finite distance bias of the skewness estimator is given by the adjusted Fisher-Pearson skewness: N PN ˆr  µ3 s t 1 t . a ˆN  1ˆN  2 σ3 The sensitivity to potential outliers is huge. Estimation is in practice quite challenging since a lot of alternative measures have been tested, implemented and proposed. Explicitly removing outliers is rather subjective and empirically difficult. It is then better to seek for robust measures.

In the case of uni-modal, symmetric distributions, the mean, median and mode should be equal. If the distribution is not uni-modal but still symmetric, the equality between the mean and the median should hold, and skewness is equal to zero. Zero skewness does not imply symmetry however, and in the gen- eral case all can happen (skewness different from zero and mean above or below median) meaning that a cautious use and comparison of estimators for a sanity check is always preferable.

18 Some measures may however use the distance between moments and quantiles to measure skewness. We will recall for instance the Bowley formula (see Bowley (1920). One finds also the terms “Galton formula” and “Yule-Kendall index”.

Q1  2medˆR  Q3 sB , Q3  Q1 with Q1 and Q3 respectively the 0.25 and 0.75 quantiles of the distribution. This quantity is naturally between 1 and 1, leading to a “normalized” value of skewness, 1 indicating an extreme left and 1 an extreme-right skewness.

In the same way, the Pearson’s first and Pearson’s second skewness coefficient write:

µ  modeˆR µ  medˆR s and s 3 . P 1 σ P 2 σ See the contributions of Hotteling and Solomons (1932) and Arnold and Groeneveld (1995) tFOR a critical review of skewness measures using the distribution mode or the median.

2.5.4.2 Kurtosis estimation Alternative measures for kurtosis are scarce. A contribution by Moors (1988) shows that sample kurtosis may be interpreted as a dispersion measure of the distribution relatively to the two points µσ and µσ. If the distribution is concentrated around µ or far in the tails, the value of kurtosis can be huge. Moors (1988) presents a kurtosis-equivalent formula close to the idea use for defining the Bowley formula of kurtosis but using octiles rather than quartiles:

ˆˆO7  O5  ˆO3  O1 kM , O6  O2

1 where Oi F ˆi~8 with the previous notations. For a standard Gaussian distribution, kM 1.23.

2.5.4.3 Statistical properties As skewness and kurtosis are perfectly known for a Gaussian distribution, as expectation and variance determine fully the law, the asymptotical law of estimators is then a bit particular. Under Assumption 2.3, Kendall and Stuart (1977) provide the distribution of the sample estimators sˆ and kˆ: º Y T sˆ follows asymptotically a N ˆ0, 6; º Y T ˆkˆ  3 follows asymptotically a N ˆ0, 24.

The asymptotic variance of kurtosis is huge. This means that in finite sample, it is important to have large values of T since there is a large chance of having large estimators. With a high sensitivity to outliers, this boosts the final estimator value when computed at power 4. This explains why empirical values for kurtosis are often very difficult to interpret.

Remark 2.1 The values of skewness and kurtosis (respectively 0 and 3) for a Gaussian distribution make it easy to build normality tests relying on the departure of skewness and kurtosis from those values. Those tests are in general easy to implement in practice, even if they should rely on robust empirical

19 measures. Such tests are for instance the Jarque and Bera (1980) and the D’Agostino and Pearson (1973) tests that compute a distance between empirical and theoretical moments. See Thode (2002), Doornik and Hansen (2008), Richardson and Smith (1993) and Chapter 2 of Jondeau et al. (2007) for a review and developments on normality tests.

Main Bibliography

• Rogers and Satchell (1991)

• Gourieroux´ et al. (1997)

• Karatzas and Shreve (1998)

• Bouchaud and Potters (2009)

• Roncalli (2013)

• Ielpo et al. (2017)

20 PART I

EQUITY FACTORS

A biased presentation, from their origins to their dark side, with some simulations...

21 22 Chapter 3

Equity Factors: General Presentation

We already tried to approach the notion of factors in Section 1.2.2. Under its most basic form, a factor is the expression of characteristics creating a link between a set of stocks 1 that helps to explain their returns and their risks. The aim of the present section is to provide a more theoretical definition of factors and see how the financial industry use now the related academic research. For many years, CAPM was the only anchoring model bringing some help to understand the underlying nature of the risk carried by the stocks, where the cross-section of returns is essentially driven by market risk. But an accumulation of evidences since the 1970s made that academics and practitioners realized that a lot of empirical facts many effects were at odds with the theory. The cornerstone paper of Fama and French (1992) came to deliver a strong message: the systematic risk that any stock bares is multidimensional. But the difficulty is to exhaust the common sources of risk, other than the market. The objective is to identify the source of risk for stocks. Identifying this allows to explain ex post the stocks returns and draw scenarios on potential risk allocation ex ante.

3.1 The Capital Asset Pricing Model

3.1.1 Lessons from the CAPM

When one looks for the first intuition ever developed on factors in the financial literature, one often finds the work of Graham and Dodd (1934) (see e.g. Cerniglia and Fabozzi (2018)) who worked, at this time, on the value premium. Apart from this historic importance, the influence of this work may seem nowadays quite anecdotal. The two major theories that paved the way of the theoretical developments of modern finance in the 20th century are the Capital Asset Pricing Model (CAPM, already evoked in introduction) and the Arbitrage Pricing Theory (henceforth APT) of Ross (1976) . Those two models may be thought of as equilibrium2 models.

If the CAPM links returns of securities to a main factor that is formed by market returns, in Ross (1976)’s asset pricing model, the return of an individual stock is mapped onto a set of risk factors ex- plaining (hopefully) a significant portion of their variability. The difference is that in the first model, the main factor is identified and explicit: all the influence of the market returns transits into the stock returns through the sensitivity of the stock towards the market return, the beta. In the case of the APT, the set of factors is not made explicit. It is more a conceptual and theoretical framework able to model

1Factors are defined independently of the asset class, so please read “assets” rather than “stocks” under the most general form! 2Briefly, an equilibirum model tries to explain the way prices evolve through the study of the interaction and the balance between supply and demand in an open universe. Roughly, general equilibrium models will try to give insights on the economy starting with assumptions on markets and behaviour, preferences and actions of individual agents.

23 the relation between stock returns and potential factors when those are at hand. Those two models are certainly at the heart of the perennial academic habit of multi-factor regression as a common tool to study and understand risk profile of financial securities.

There are several lessons to retain. The breakthrough of the CAPM is that for the first time in finance theory, risk is measured as a factor exposure. Therefore, the risk premium (the expected reward) of a given stock is a direct function of the stock beta. If the specific risk of all assets, when combined, can be diversified away, the market risk cannot be diversified, and therefore asks for a compensation which is, naturally, the Equity Premium. We have to add one thing here that will have some echo when we will speak about the low-beta anomaly. Within the CAPM, the assets with the lowest betas should have the lowest compensation. Indeed, those stocks offer a strong decorrelation with the market (and are hence super diversifiers) so the compensation they should earn is weak since they already allow to temper losses in bad times. The APT generalizes this approach by assuming that the risk premium earned by an asset is a linear function the relative risk premia of factors. The underlying idea is strong, even if the factors are not explicit: there is not only one factor that can capture the risk. Moreover, the definition of “bad market conditions” is more difficult to elaborate: with only one market factor, bad times are peri- ods where the returns of the market are negative, but what with more factors? The name of the theory (“arbitrage”) is also due to the fact that as in the CAPM, all the risk cannot be diversified away (the risk of the factors) and that at equilibrium, investors should earn a compensation for bearing those various sources of risk expressed through the factors.

So at this stage, CAPM is a single-factor model where the risk premium associated to one stock is fully related to the exposure of the stock to the market, and the APT is a multi-factor model where the factors are however left unspecified. Many people have in mind the work of Fama and French (we will come back on it later) that are often celebrated as the fathers of factors but this seem in fact to be the apogee of a work started in the 70s. During those years, as recalled again by Cerniglia and Fabozzi (2018), some researchers found that some buckets of securities may generate a superior performance to the market. Apart from the market mode, the first factors (even if not compulsory branded under this name) began to appear with the works of Basu (1977) and Banz (1981). Those two works are known because they are the two first contributions highlighting the fact that stock returns may be related, statistically, to the characteristics, properties or features of those stocks. Banz (1981) identified that portfolios made of stocks with small market capitalisation were earning higher risk-adjusted returns than portfolios of bigger stocks. This anomaly is known nowadays under the name of Small Minus Big (a portfolio long of small stocks versus short of big stocks). Before the name of factors, those characteristics linked to the returns of the stocks gained the name of anomalies, referring to the fact that this property was anomalous when put in front of the CAPM theory. Therefore a market anomaly has to be understood as an anomaly with respect to this seminal theory. As detailed in our introduction, this was a major challenge to explain for traditional finance. At this time, this cross-sectional definition of extra-performing sub-portfolios were already called anomalies.

In the Chapter 6 of Ang (2014), 6 lessons are drawn from the CAPM model. In particular, in addition to what we recall here, Ang adds the fact that “each investors has its own optimal exposure of factor risk” and that “the factor risk premium has an economic story”. The last lesson interests us since it will link the CAPM with the notion of risk premium. Bad times are defined conditionally to the factor state. In the CAPM, bad times occur when the market returns are negative. The risk premium is a compensation for holding the risky assets in bad times. With only one factor, so only one notion of beta, a risky asset is a stock with a high beta. So it is natural to think that holding high beta stocks should come with a higher compensation for investors. Conversely, stocks with low beta are already diversifiers, so they do not need an extra compensation since they already potentially pay off in market downturns. We will

24 come back on it later.

Take-Home Message

The main message of the CAPM is that the market factor is central when studying stocks’ returns. This market factor defines the core origin of systemic risk that cannot be diversified away. All the idiosyncratic stock risks, apart from the factor, can be diversified through diversification. This being said, it is optimal for the investor, from a return perspective, to hold the market portfolio, rather than any other portfolio made of the same stocks. This holds in average, since an average investor, at equilibrium, will hold the market. This does not prevent, however, individual in- vestors to have different preferences, different appetites for risk, individual and domestic biases, etc. But they can choose their own exposure to risk as soon as they are aware of the existence of the factor(s). The systematic risk that any stock bares is multidimensional. More strongly, the risk that an investor will bare will be directly linked to its factor exposure. This theory with one factor (the market) will extend easily to multi-factor models, and the same message will hold.

3.1.2 Some elementary CAPM theory It is essential to understand the importance of the influence of factors and how they emerge. One of the main lessons from the literature is the seminal message behind diversification. Most portfolio managers around the world would know the famous law that states that by increasing the number of stocks in a portfolio, its volatility is lowered: by doing so, the idiosyncratic risk attached to individual stocks is diversified away, as the risk of one stock is mitigated by the risk of other stocks. Volatility is indeed lowered, but at the expense of something else: as individual stocks’ influences over the returns on the portfolio is lowered, another influence grows up, the impact of the market portfolio.

Let Ri,t be the return of stock i at time t. A portfolio mixing different stocks together will have returns p Rt that writes:

N p Rt Q ωi,tRi,t, (3.1) i 1 where ωi,t is the weight (assumed to be positive) of asset i at time t and N is the total number of assets in the portfolio.

The CAPM as presented in Merton (1973) relates the returns on individual assets to the returns on the market portfolio, a portfolio that represents best the evolution of markets. Usually (see the discussion in introduction in Section 1.1.2) this portfolio is chosen to be the market capitalization benchmark attached to the set of stocks. Let RM,t be the returns on such a portfolio and let Rf,t be the risk-free rate, that the yield attached to a short term bond with the lowest achievable credit risk. The CAPM states the following relationship: Ri,t Rf,t  β ‰RM,t  Rf,tŽ  t, where β is the stock’s sensitivity to the market portfolio and t is a centered noise usually assumed to be Gaussian for estimation consistency reasons. A natural estimator for β is:

Covˆ ˆR ,R  βˆ i,t M,t ˆ V RM,t¥

25 where Covˆ ˆ., . is the estimator of the covariance operator and Vˆ .¥ the estimator of the variance. The former definition does not hold any more in the presence of an α attached to individual stocks (that is the portion of returns not coming from the market’s performance). A proper econometric way of stating this relationship is the following:

Ri,t α  Rf,t  β ‰RM,t  Rf,tŽ  t, (3.2) Equation (3.2) can be estimated using Ordinary Least Squares estimators for α and β. Here α is not time dependent, and neither is β. This is what we will assume for the moment for the sake of simplicity. In an equilibrium model, α should be zero, but the proper econometric way to write it is to include the constant and test for its nullity. Equation (3.2) also makes it possible to measure the explanatory power of the model using the R2 attached to the regression. It is the part of the variability of the returns actually explained by the model. The market portfolio is by essence the first factor: it is a risk that is consistently priced in the cross-section of returns.

ABC Vocabulary

Equation 3.2 defines the terminology around alpha and beta. For a given benchmark, beta is the sensitivity of the return time-series towards the market. Alpha is deduced from the equation as the idiosyncratic premium of the stock.

The terminology extends when a stock return time series is replaced by a mutual fund investment time serie. In this case, the alpha is a proxy of the extra-performance generated by the fund manager.

3.1.3 Empirical illustration We proceed here to a simple numerical experiment. We have seen that diversification helps to reduce the risk of a portfolio and that the obtained risk is lower than the risk of an individual stock. So what is the influence of the number N of stocks in the CAPM? We use here a dataset of individual stocks contained in the S&P500. The dataset is daily. It starts in January 1999 and ends in March 2016. The experiments presented here are based on geometric returns. A standard way to assess the impact of a growing N on the characteristics of an equity portfolio is to randomly pick N stocks in the investment universe, combine them into a portfolio and then run the regression presented in Equation (3.2). In this experiment, for each N we build 10 000 randomly selected portfolios of size N (10 000 samples). The regression will be very alike Equation 3.4, except that we do things P   Rt α βRM,t t (3.3) so equivalent to 3.2 (except that for sake of simplicity, we assume the risk free rate to be equal to zero). For each simulation, the R2 of the regression, the α and β estimates and the volatility of the portfolio are gathered. N goes from 2 to 50 stocks. Figure 3.1 shows the average values for each of the quantities and therefore the estimates: 10000 10000 1 ˜ 1 ˆ α˜N Q αˆi,N and βN Q βi,N , 10000 i 1 10000 i 1

10000 10000 ˜2 1 ˆ2 1 RN Q Ri,N and σ˜N Q σˆi,N . 10000 i 1 10000 i 1 Figure 3.1 represents each those quantities as a function of N. The striking conclusions are collected in the following:

26 Volatility Alpha

+ 0.22 0.030

+ 0.20 + + + ++++++++++++++++++++++++++++++++++++++++++++++++ + 0.020 0.18 ++ +++ Alpha Portfolio Portfolio volatility Portfolio + +++++ ++++++++++ 0.16 +++++++++++++++++++++++ 0.010 10 20 30 40 50 10 20 30 40 50

Number of stocks Number of stocks

Beta R2

++++ +++++++++++++++++ +++++++++ +++++

0.8 + + +++ 0.88 + + +++ + + ++ +++++++++++++++++++++++++++++++++++++++++ ++

0.7 + + + 0.6 0.84 +

Portfolio Beta Portfolio + Market factor's R2 factor's Market 0.5 + 0.80 10 20 30 40 50 10 20 30 40 50

Number of stocks Number of stocks

Figure 3.1: R2, α, β and volatility of a portfolio of stocks as a function of the number of stocks in the portfolio. US (S&P500 index), 1999-2015. Taken from Ielpo et al. (2017).

1. Portfolio’s volatility is a decreasing function of the number of stocks. It reaches a limit around 16%, which happens to be the market portfolio’s volatility.

2. As idiosyncratic risk gets diversified, the volatility of the portfolio decreases until the exposure to the market portfolio dominates the bulk of the portfolio’s volatility. This explains this asymptotic behaviour.

3. R2 is clearly an increasing function of N. With 2 stocks, the average R2 is around 40%, but then quickly increases above 50%. For 50 stocks, the average R2 is around 90%. As the number of as- sets in the portfolio grows, the exposure to market factors explains most of the performance.

4. Neither α nor β show much variability depending on N: the increase in the R2 is not explained by any dynamic in these parameters.

Those facts are well known for volatility. Indeed, the link between the systematic and idiosyncratic risks and the number of positions is an old problem and has been well identified, soon with the work of Evans and Archer (1968) who, for random subsets of stocks, find a monotonic, inverse relationship between the realized variance of portfolios and the portfolio size. The more there are assets in the portfolio,

27 the lower its variance. This experiment has been built upon a random pick of equi-weighted portfolios (made from a number between 2 and 40 assets in an universe of nearly 500 stocks in the S&P composi- tion freezed in 1958, followed afterwards during ten years). In short: the risk of a market portfolio can be approximated and reproduced with only 8-10 securities! This has been confirmed by Fisher and Lorie (1970) who identify that 30 stocks in a portfolio allows to similar level of diversification of 95% than the Benchmark (again random picks, no factor). But the observation related to R2 is generally forgotten. See Lehalle and Simon (2019) for an in-depth study on the relation between Long Only portfolios (Min- imum Variance and portfolios with rewards) and the number of positions. This conclusion is however very important: not actively dealing with risk factors would be a major flaw of any attempt to understand the true nature of portfolio’s returns behaviour.

3.2 Factor Theory

A by-product of the CAPM theory is that it is now possible to define what are bad times for assets. If we follow the CAPM, bad times are periods of negative or low returns of the market portfolio. Yet, if there are other drivers than the market for the returns then it is more difficult to spot bad times. If other drivers than the market are identified, like in the case of APT, the intuition stays unchanged: those alternative drivers of risk define their own periods of bad times. Factors are then those risk drivers that lead to a risk premium which is a compensation to investors that hold those factors and suffers losses in the periods of bad times, relative to each factor. The definition of compensation, risk and bad times becomes multi- dimensional. Holding assets, investors cannot diversify away the market and those factors. OK, now we have an intuition of what is more or less a factor, in the sense that it is an extension of the CAPM, like the APT, where there are other risk drivers than the market.

Take-Home Message

It remains a question: do we have any clue on what we are looking for? We will see in Section 3.3.1 what are, according to academics, the minimal requirements of what should be a factor model. But the intuition guides us towards a setting such that:

• the factor model is parsimonious and involves as few parameters as possible;

• the factor model is stable: from one period to an other or from an asset to an other, the parameters do not vary greatly;

• the factor model avoids to get colinear factors: any additional factor should bring new information and must bring as few redundancy as possible.

3.2.1 Extending the CAPM equation: the three types of factor models 3.2.1.1 Let’s write it properly... If we have to write under the more general form a factor model under a linear form, the most general form is given by Equation 3.4:

 œ  Ri,t αi βiFt i,t. (3.4) Stop. Let’s take a look closely at this simple equation. In fact, the complexity (if any) on the topic relies only in understanding the assumptions of the model, the nature and the dimension of the objects

28 under scrutiny in the model. First let’s describe the indices, they are two kinds of them: i denotes the label of financial assets (typically stocks, but please be aware that in many academic papers it can be portfolios so linear combinations of assets); t denotes time. Then, R is simply the return of the asset under consideration. We will deal in this chapter with i > 1; N¥ and t > 1; T ¥. Then Ri,t is typically a daily or monthly return of a stock i at the end of the period t. Ft ˆF1,t, ..., FK,t are the factors. Let’s note that Ft may be time dependent but is not asset-dependent. Factors aim at being common risk drivers so their realizations are asset-independent. In this chapter, we will note K the number of factors. βi is a matrix or a vector, which represents the sensitivity of returns with respect to the factors. The most accepted names for β are factor loadings of factor betas. There are as many loadings as factors (K therefore). Note also that those betas do not depend on time. We will come back on this later.  is an idiosyncratic perturbation. Depending on the type of model, the exact dimensions of the objects depend on the assumptions we make for each kind of factor models. Let’s note finally that factors in the general case are only random variables and not compulsory returns!

Assumptions - Regardless of the dimensions of the objects, the same assumptions will generally hold. First, we ask that the factors are stationary (see definition in Section 2.3) with fixed uncondi- tional moments that we note E Ft¥ mf and Cov Ft¥ Ωf . mf is naturally a vector of size K and Ωf is simply a K  K symmetric, real-valued matrix. Second, the asset-specific perturbations  are assumed to be uncorrelated to any factor component. Third, the perturbations are serially uncor- related for a given asset, and uncorrelated at the same period between two different assets. Namely, œ ˆ œ  2 ˆ  ˆ   cov i,t, j,t σi δ i j δ t t . We will note in the following ∆ the N N diagonal matrix ˆ 2 2   ∆ diag σ1, ..., σN . Again, the covariance matrix of all the assets is a N N matrix that we note Ω. Advanced specifications on the model are linked to the choices made to estimate the parameters such as β or ∆ and the specification of the number of factors. We will keep a general point of view to understand the rationale between the models but for sake of concision, we will leave the estimation puzzle to more advanced courses.

Remark 3.1 Meucci (2014) is really a great reference on Linear Factor Models with many subtleties that are carefully tackled. In particular, the implicit context of the presented models is when K is lower than the number of assets. Implicitly we are in the context of a dimension reduction objective (describe the returns with parsimony). In a very general case, we may however conceive situations where N @ K.

3.2.1.2 The three types of factor models There are mainly three types of factors. There are statistical factors; macroeconomic factors; and fun- damental factors. Of course, things are not so simple. Linear Factor Models are in fact a more subtle concept where the cornerstone reference is Meucci (2014). In this reference, a subtle difference is ex- pressed between a linear factor model, and the linear econometric model that we express in Equation 3.4. But for sake of clarity on a topic where in the academic world as well as among practitioners, no one will take the time to make this clarification. So for keeping things clear and well classified, the three aforementioned types of factors do represent the essential part of what is now quite standard in the literature (see e.g. Connor (1995)) or among practitioners.

Macroeconomic factors - The first kind of factors are macroeconomic factors. Beware of the name: it is more related to the context of utilisation rather in the econometric technique. In fact in this first kind of model, contemporaneous versions of R and F are observed. This is in fact the only case of factors that we will study where realizations of Ft are assumed to be observed. So we can conceive that the econometric aspect of estimation will not be a real challenge and they generally involve classical econometric tools. The only thing to do is to choose observable time series to explain the cross section

29 Factor Model Type Inputs Needed Econometric Techniques Outputs Macroeconomic Stock’s returns and Time-series regression Stocks’ betas macroeconomic variables Statistical Stocks’ returns Time-series, PCA, Statistical factors cross-sectional regression... and stocks’ betas Fundamental/Style Stocks returns, Cross-sectional regression Fundamental factors characteristics, (opt: stocks’ betas) financial statements...

Table 3.1: Description of the different type of factors (see Connor (1995))

of returns. Those time series have to be stock-independent. So the easiest thing to do is to consider as observable factors macroeconomic time-series. That’s the reason why those kind of models gain this name, according to the common choice of data for the factors, rather than to the underlying mathemati- cal model.

Statistical factors - The second kind of factor is statistical. In other words, it appears as eigenvectors of the covariance matrix of returns. This will be the object of a full treatment in Chapter 9. The idea is that the natural portfolios representing factors emerge directly from the observation of the past returns of assets and the potential dependence links emerging from those returns. Consequently, neither the betas, neither the factors are observed. Both emerge due to assumptions on R or on the covariance structure. This is the main difference with the macroeconomic models, the factors here are not observed.

Fundamental factors - The last kind of factor is the fundamental factors. They focus on ex-ante information, on characteristics like sector, country, industry, or financial information like available in financial statements (income, revenues, assets, etc). The use of the financial statement (fundamental) information gave the name to the model. We also find alternatively the term style factor in some con- texts. So in this view, stocks characteristics help to build the risk drivers, which is not the case in the two first models. Those stocks’ characteristics help in two ways. The first kind of sub-fundamental model assumes that the betas are observed (see below Section 3.2.4) and the second one uses characteristics to estimate first the factors, before estimating afterwards the betas (see below Section 3.2.5)). In those two sub-cases, the factors are initially unobserved.

As a sum-up, we reproduce in Table 3.1 a nice characterization of factor models as presented in Connor (1995) with the main econometric techniques that will be involved. More or less, regressions will be made along two axes: time-series regression (asset fixed, varying time) when the time-series of fac- tors are observed; cross-sectional regression (meaning that the time is fixed and the assets and their characteristics do vary in the regression) when the characteristics are observed.

An essential quote from Connor (1995):

“The three types of factor models are not necessarily inconsistent. In the absence of estimation error and with no limits on data availability, the three models are simply restatements or (to use a technical term from factor modeling) rotations of one another. (...) In this eclectic view of the world, the three factor models are not in conflict and all can hold simultaneously.”.

30 Take-Home Message

Do not forget that those models are only attempts to find proxies of unobservable risk drivers. Therefore, ex-ante, no model is better than an other and only the ex-post explanatory power of the regressions plus the available data at hand may help us to choose. An important thing to keep in mind is that factor models are a helpful hand in two situations.

First, the equations they support often come from asset pricing models. So in this respect, they may help to understand which part of the risk is priced. This is mainly interesting from an academic point of view. Second, they help to estimate risk. Then understand it, or help to provide parsimonious estimators or risk.

3.2.2 Macroeconomic Factors

They do correspond to the situation where we observe R and F in Equation 3.4 and the F variables are K time-series common to all the stocks for each regression. The K macroeconomic factors have to be found among macroeconomic time series such as interest rates, market risk, industrial production, money growth, inflation measures, commodities’ prices, housing or unemployment data, etc. and aim at evaluating the impact of shocks on those variables on assets’ returns. Consequently those factors are potentially those that represent best the notion of economic risk and how it translates among equities. So they are the ones that should represent best the intuition of “bad times”.

The main model that is applied is simple linear regression. As no cross-information is involved, the regression is the result of a simple OLS regression asset by asset, the regression should be stacked or not (see e.g. Connor (1995)). Each time series of stocks’ returns are linearly regressed on macro shocks, we are therefore in the simplest situation of time-series regression. The regression coefficients are the asset’s sensitivities associated to the factors, then the factor betas.

From an estimation perspective, the nice thing is that the OLS estimate provide easy formulas for β 2 and also for the individual idiosyncratic σi of the stocks. As the factors are observable, both mF and ΩF the unconditional moments, may be estimated through historical moments. In this kind of model, βi is a K  1 vector.

The first example of such a model is... the CAPM! This is wrong to say this under this form. CAPM is a broader model, and a equilibrium model. However, if we assume that αi is null, K 1 and Ft RM,t is the return of the market in 3.4, we recover the CAPM equation. Normally α should be absent in the CAPM but the proper way to deal with it is to include a constant in the regression and test for its nullity. This model is called the Sharpe-Lintner Single Index Model. In this case, for each asset i, βi is a scalar 2 ˆ œ  and if we note σM the unconditional variance of the market, and B β1, ..., βN (which is a N 1 vector) we get with some trivial computations give us that:

2 œ  Ω σM BB ∆.

Classical OLS estimation gives us the expected results. Even if it is an illustrative model, it remains that the R2 of the regression measures the proportion of the risk that is explained by the factors, here ˆ  2 2  2 2 ˆ ˆ2 2 ~ˆˆˆ  the market. Indeed for each asset we have that V Ri,t βi σM σi and R βi σˆM V Ri,t . Therefore the 1  R2 quantity is the asset specific risk. Extensions of this model include time-varying betas but this is beyond the scope of the present course. The estimation setting does not change when

31 there are K factors, just that the expression of Ω is modified in:

œ Ω BΩF B  ∆

œ where B ˆβ1, ..., βN  is this time a N  K matrix.

Such a model is for instance applied in the famous paper of Chen et al. (1986) for stocks, and by Lo (2008) when applied to hedge or mutual funds’ returns. The question that such a model helps to solve is the sensitivity at the stock level of shocks on macro variables, in a what-if scenario. What should be the impact on stocks of the portfolio of an investor if there is a shock, say, on oil prices? For stocks as airlines, we see that the impact of such an event is likely to be very high. But the challenge is to spot, to identify, the potential sources of macro risk for all groups of assets. This makes that macro factors are not really tradable in practice. As highlighted by Pukthuanthong and Roll (2015), the theoretical advantage of macro factors is straightforward. Indeed, it is documented (Ang (2014)) that macro factors are responsible for the main part of the variation of stocks returns, nearly 60% of the market premium (the equity risk premium). Yet the fact that macro variables are released at a very low frequency, makes that their use is of limited interest in practice. Moreover, it is not possible to get an arbitrary exposure or a hedge directly with stocks macro factors like e.g. inflation. In fact, it is more efficient to try to seize an exposure to macro factors with other financial assets than through trading stocks. As stated by Ang (2014), “asset classes do not move one-for-one with macro factors and many of their movements are perverse or unintuitive. Equities are a claim on real assets yet a terribly inadequate choice for tracking inflation. Real estate is a better inflation hedge. (...) Taking a macro view requires a framework for how macro factors simultaneously affect many asset classes.”.

Main Bibliography

• Chen et al. (1986)

• Lo (2008)

• Ang (2014)

3.2.3 Statistical Factors

As stated before, in this kind of model, only the asset returns are used and the factors and the loadings are inferred directly from the data. Using those factors has one main drawback: they are only the reflection of phenomenons observed in the past. So for those factors to hold, we need them to be at least stable in time. On the paper, this sounds great! But if there is no stationarity or if the timespan of the estimation set is too long when compared to the resilience of the factor, there is no use for them.

The two sub-canonical approach for finding statistical factors are PCA and what is called Factor Analysis which in spite of its very broad name is really a particular case of statistical factor model. The PCA approach will be treated specifically in a next chapter, so we will stay rather descriptive in this section adopting mainly a bird’s eye view. One assumption to keep in mind is that if we want those model to behave nicely, we have to assume that T A N. The situation where there are more assets than points in the time series, then there are some ways to adapt the model but this is again beyond the scope of the present course.

32 3.2.3.1 Factor Analysis This split between Factor Analysis and PCA may appear a bit artificial but we make this distinction since for methods strictly called in the literature Factor Analysis, the loadings are estimated first, whereas in the case of PCA, the factors are estimated first with an iterative procedure (more or less).

The idea is to exploit Equation 3.4 with the additional assumption that the factors are orthogonal and centered. Said otherwise, in this kind of model, Equation 3.4 is freezed at each time-period and therefore the regression is a time-series regression. We can then fix t and stack all assets in the same return R object meaning that Rt is a N  1 vector, as are α and t. β is a N  K matrix, and Ft is a K  1 vector. The assumptions we keep are the following:

œ • Ft and tœ are uncorrelated, for each t and t ; ˆ  ˆ 2 • Cov t ∆ diag σi ;

• E t¥ 0.

The second assumption is in fact very strong. Covˆt may assume more sophisticated forms, allowing for serial correlation and heteroskedasticity. But following Bai and Ng (2002), a strict factor model is based on the assumption that this matrix is a diagonal one, therefore not allowing for correlation between two different idiosyncratic components. See Bai (2003), Bai and Ng (2002) for reference. The common factors are assumed to be unobservable and must be thus measured from the cross-section of the variables Ri,t, and only a subset of them are expected to be statistically significant. Some additional assumptions are generally added on the factors assumed to be centered and uncorrelated:

• E Ft¥ mF 0;

• CovˆFt IK .

The consequence is that Ω rewrites Ω βœβ  ∆ ¥ PK 2  2 and V Ri,t k 1 βi,k σi . Let’s note that such a problem is over-identified since any rotation matrix œ œ Θ such that Θ Θ IK makes that Θ .Ft is also a solution to this problem with loadings rotated such that the new loadings are β.Θ. Clearly, for such a model, interpretation is not the key. We look more to decompose and explain the risk rather to interpret the origins of the risk drivers. The procedure to estimate such a model is generally to first estimate β and ∆ before reconstructing Ft, and potentially to look for a Θ in order to ease interpretation. Generally, to estimate α, β and ∆, one uses MLE with classical assumptions on the returns assumed to be for instance normally distributed and temporally i.i.d. Once we get estimators αˆ, βˆ and ∆ˆ , it is easy to observe that Rt  αˆ βFˆ t  ˆt and that the estimator of Ft is available with Generalized Least Squares and for each time-period t we estimate:

œ 1 1 œ 1 Fˆt ˆβˆ ∆ˆ βˆ βˆ ∆ˆ ˆRt  αˆ.

See e.g. Anderson (2003) as a reference. Let’s note that all this procedure works for a fixed K. So we need to start with a guess on K of the suitable number of factors. This is a not a buy product of the estimation just described. We leave to the reader to read more insights on those models to gather rules to determine the suitable number of factors to include in the final model. Those rules are always a balance between parsimony and explanatory, evaluated via a loss function. Main papers to read on the topic include the works of Connor and Korajczyk (1993), Bai and Ng (2002), Alessi and Capasso (2010).

33 3.2.3.2 PCA: Principal Component Analysis The topic will be highly detailed through Chapter 9 as PCA is in fact related to eigen analysis of the covariance matrix of assets Ω. We adopt here a more general point of view to describe the method more intuitively. The method starts again with the covariance matrix of assets, and aims at reducing the dimension of the problem to explain as many information as possible. An eigen-vector is a vector of size N with real-valued weights, meaning that those principal components are only linear combinations of returns. The idea is to keep the K most informative factors so that they are built recursively (the definition of the second one depends on the first one, etc.). The vectors are asked to be orthogonal one to each vector and to have a unit norm. The unicity of the decomposition is not a request of such a model. Therefore the optimization problem to find the first component v1 is simply: œ maxv1 v1Ωv1 œ such that v1 is a real-valued vector of size N with v1v1 1. The second component is again a real-valued vector of size N obtained through: œ maxv2 v2Ωv2 œ œ with again v2v2 1 but with the additional condition that v1v2 0. At each step, the new vector should lie in the space orthogonal to the vector space spanned by all the previous components. Again, this is detailed in a next chapter, the only thing to keep in mind at this stage is that the proportion of vari- ˆ K ~ˆ N  ance that the K first principal components do explain is simply Pi 1 λi Pi 1 λi where the λ are the eigenvalues of Ω sorted in descending order.

Once the K eigenvectors v1, .., vK are recovered, all are homogeneous to portfolios then this mean that we have not recovered factors yet. Therefore Ft can be estimated by stacking the vi.Rt quantities for i 1, ..., K. This gives us an estimation of Fˆt and classical econometric methods (OLS again) may be applied asset by asset.

Remark 3.2 If we note Ω V ΛV œ the spectral decomposition of the covariance matrix (exactly in the way we just did it supra) it is easy to see that with the notations of Section 3.2.3.1 then the choice of  1 Θ V Λ 2 allows to recover orthonormal factors.

Remark 3.3 This optimisation program is equivalent to maximize (along α, β, and the factors) ht objective function which is equal to the R-squared of the multivariate regression of the asset returns in Equation 3.4. See Meucci (2014) for instance.

3.2.4 Fundamental Factors with observable betas Fundamental factors illustrate a case where asset specific characteristics are used to recover the factors. However there are several ways to do it, and this illustrates that factor models are only... models that try to uncover the nature of risk drivers, but are not illustrative of a truth. Stock characteristic are sometimes quite slow (financial statements) or never changing (belonging to a sector, an industry, a geographical zone). Paradoxically, even if this information may be slow or static (when compared to stocks returns) this is the kind of factor that is the most used, widely explored in the academic literature, and largely used in Smart Beta products. When practitioners speak about “equity factor models” they implicitly refer to this kind of model most of the time. But at this point, we cannot specify much Equation 3.4. We need for this to fork and see how the stocks characteristics are used in practice. In this precise case, we will assume that the loading ie the betas are fixed and depending on the assets. Yes, this means that the βi will be built/observed thanks to the assets characteristics, the factors remaining to be estimated.

34 This approach is the one chosen by BARRA Inc. which is the company that built and sold the risk model software that is, more or less, the most widespread in the asset management companies. This explains why one could sometimes find the name BARRA used with the model, since the approach is at the heart of the industrialized model. The model is heavily discussed e.g. in Grinold and Kahn (2000). In this approach, the factor betas are fixed, depending on assets and assumed to be not varying in time. The challenge is then to estimate the factor realisations Ft. For this, T regressions will be repeated to estimate the factors period by period. The aforementioned assumptions still hold and in this case the βi are considered as being observed data, the residuals of the model being again potentially heteroskedastic across assets, the weighted least squares3 estimate is given by:

œ 1 1 œ 1 Fˆt ˆβ ∆ˆ ⍠β ∆ˆ ˆRt.

Typically the model employs rarely a constant and note that here the betas are not estimated (contrary to the case of Factor Analysis) since they are observed here. What does those betas look like? They are various and may have a large dimension. The professional model uses a lot of proprietary transforma- tions and adjustments but for instance, it is known that the industry of the stock is translated into sparse betas with as many betas as industries with 0-1 dummies depending on whether the stock belongs in the industry or not.

3.2.5 Fundamental Factors with estimated factors: Fama and French

At the beginning of the 90s, the original work of Fama and French (1992) became an inspiration for a huge academic work in the next two decades and irrigated a complete stream of the financial industry. The core of their work was that they identified systematic, persistent and interpretable sources of risk which could be qualified as persistent drivers of returns. In addition to the market, Fama and French identified the first two explanatory factors (still considered as so in the equity sphere as of today). The first is a portfolio based on the market-capitalization, focusing on the Small Minus Big effect (SMB) The second factor is the Book-to-market or more commonly called the Value factor : this characteristic is built by comparing the market value of the stock (price times number of shares) with the accounting value of the company. This factor is often called High Minus Low (or HML). Those two factors are then number two and number three. The market remains, like in the CAPM, a risk driver of the stocks in cross section. The market remains... the market, still potentially affected by macroeconomic factors like inflation for instance. We will explore soon in Section 3.2.5.1 the exact details of the construction of the factors, yet the studies show that the exposures to those factors (including the beta to the market) may change over time and potentially increase in magnitude in crisis periods (the so-called bad times). We will see that in fact, Size and Value are what we call mimicking portfolios. We will ask the question later on what is really a factor, yet in the Fama-French model, the additional factors that are used need to show long and short positions to really differ from the market.

Why is this model so famous among practitioners and academics? The fact that other factors than the market may have tight linked with returns was not new in itself. This was also at the heart of the works of the one effect-one anomaly approaches of Basu (1977) or Banz (1981), already on Value-like and Size-like effects. What is really new here is that it is not a one-by-one effect, but a collective construc- tion. One advantage of the model that helped the model to be famous is also that the model remains parsimonious (only three factors). What is crucial is the attempt to explain the variance of the returns in cross section with this set of style factors.

3WLS is in fact just a special case of GLS. GLS is the general case when errors are potentially correlated / dependent while WLS are used when errors are independent but not identically distributed.

35 The message was quite disruptive in the 90s for academics as for practitioners, after years where the market risk premium was believed to be the only strong driving force of stocks returns. According to the messages of the APT, if there are other factors than the market, and all risk cannot be arbitraged away, the portion of risk should provide, at equilibrium, a premium. Value and Size exploits the fact that the stocks that share a certain degree of proximity along the Value and the Size criteria, will move accordingly.

The fact that a premium should be the reward for assuming the Value risk is different from the question of why such premia do exist. Fama and French did not answer at the time, in 1992, to this question. Therefore, a quest for the interpretation of those two factors (size SMB and value HML) began and is still live. We will try to review the literature on the topic in Chapter 5. In few words, the common understanding is that the value premium is more a reward for potential distress risk and that the size effect is a compensation for investors willing to accept the risk of small stocks (with small liquidity and low visibility). The two factors benefit from a risk rationale yet interpreting really Size remains, as we will see, a challenge.

3.2.5.1 Factor construction

Take-Home Message

You may hear often from the Fama and French model. Beware: keep in mind the general message but do not take much time to understand the fine details of the experimental setting construction through the various papers. Indeed, what Fama and French built in the papers (the message stays the same yet the setting changes through time and publications) is really sepcific. So it should be taken as it is, with its peculiarities and its breakthroughs. But remember, it is basically an empirical asset pricing model. And as they recall it in Fama and French (2015), this kind of model “works backward” and is mainly designed to “capture patterns in average returns” and dissect “the relation between average returns” and explanatory variables.

Technically, Fama and French (1992, 1993), to build their model start by using portfolios. This is very important to understand since it explains partly the huge regression figures that they obtain. First they use the market beta and an other criterion (say size or book to market) to stratify their stock universe. They do it in a rolling fashion, but still, at each time they compute their regressions, they do cut their universe along those two criteria. Let’s say that they split the universe in N buckets along beta and in turn, N buckets along the second variable for each, we are left with N  N portfolios (with depending on the paper, N equal to 5 or 10 for instance). Let’s call them in the following regression portfolios for sake of clarity even if this term is purely local to our discussion. Those regression portfolios have then the same number of stocks and each portfolio is really homogeneous in terms of variables. They proceed to a rebalancing typically twice a year. For each period t, the portfolios have a return Ri,t where i is the index of the portfolio and varies between 1 and N  N.

Remark 3.4 This stratification is used in order to account for the estimation biases of the market beta, which is biases downwards for low betas, and biases upwards for high betas, the argument being pushed in Fama and French (1992) (see also discussions in Fama and MacBeth (1973)). However, one may ask the question if it is econometrically correct to pre-stratify your data before a regression using the same variables as explanatory variables. This would have to be discussed even if in the papers, they are well aware of the potential biases and links between the explanatory variables. Second, in order to build regression models, they need now explanatory variables that will be in turn, portfolio returns (but not the same portfolios as before). Those portfolio returns and market returns,

36 included in the model, are the same for each regression. Those portfolio, helping to build explanatory returns, are therefore different from the regression portfolios. So, in addition to the market, they build six portfolios, using stocks listed on the NYSE, AMEX, NASDAQ exchanges. They do a kind of style boxing where they divide the universe in 2 categories along Size (Small and Big, so respectively S and B) where the breakpoint is the median value of the NYSE market as measured at the end of June, rebalancing in July. For each sub category, Small and Big, they split the two sub-portfolios in three each, along Value with 3 buckets where breakpoints are the 0.30 and 0.70 quantiles of the book to price ratio measured in December, for a portfolio rebalancing occurring in next January. For Value there are then three buckets, Value, Neutral and Growth (V, N and G). The fact that there are three buckets for Value and not for Size is justified by the fact that a greater role is attributed to Value. The model is really a model from the 90s, with no daily trading and slow accounting data. There are a lot of adjustments and features in the stock selection that we do not detail here. So 6 portfolios can be defined: SV, SN, SG, BV, BN and BG. Each portfolio as a basket of securities has aggregated returns. The two factors SMB and HML are then computed as return time-series where:

SMB ˆ1~3 ˆRˆSV   RˆSN  RˆSG  ˆRˆBV   RˆBN  RˆBG¥

HML ˆ1~2 ˆRˆˆSV   RˆBV   ˆRˆSG  ˆBG¥. Again, those returns are observable for each time period. For each regression portfolio i, the regression model that Fama and French state is a time-series one for each i:

  SMB  HML  Ri,t αi βiRM,t βi SMBt βi HMLt i,t (3.5) where the coefficients are determined through classical OLS with i.i.d. perturbations . The coefficients β in Equation 3.5 do not determine in isolation the exposure of the returns with respect to the factors. What matters is the statistical significance of those coefficients, not the value itself, since as the regres- sion is made jointly, it is difficult to interpret those coefficients. Those coefficients are, in the model, assumed to be centered around zero in cross-section (so they can be positive and negative), but it is as- sumed that those betas for one regression portfolio, are constant in time. This means that, as expressed by Ang (2014), that “the average stock has only market exposure”. Normally, one should add a constant in the regression in Equation 3.5. But, philosophically, as the model is an extension of the CAPM, the constant term is finally excluded in the final model. The true story is that Fama and French do include the constant in the equation but test for its non-significance. Therefore, the zero-intercept model already gives a “parsimonious description of average returns”. Indeed, the model gives good results with R2 equal to 0.9  0.95, which is excellent.

Fama and French insist on the fact that their model is an equilibrium model, like the APT for instance. In particular, Ang (2014) recalls that the market is Size and Value neutral and that “as the average investors holds the market, the average stock does not have any size or value tilt. It just has market exposure”. One pitfall of the model is that in practice, the betas of the stocks on the factors are not constant. And this variation in itself is, as recalled by Ang (2014), an additional source of risk, as those sensitivities tend to increase in bad times. Moreover, let’s stress that Fama and French did not claim that Size and Value are risk factors per se but rather those variables, in the observable world, in addition to the mar- ket, are proxies for the unobservable risk factors affecting the commonality of stocks’ returns. Fama and French do not provide the economic rationale to explain why those two4 precise factors catch this pattern at the time they wrote the paper. They even recall in Fama and French (1995) that “Size and Value remain arbitrary indicator variables that, for unexplained economic reasons, are related to risk factors in returns”. The rebuilt factors are only made of portfolio returns providing “combinations of

4They however link in the 1995 paper Size and Value to profitability and earnings.

37 two underlying unknown risk factor or state variables”.

Remark 3.5 To be fairly honest, there is an other test in the Fama and French papers. In a second step, the out-of-sample returns (at the stock level) are regressed once again on the estimated betas of Equation 3.5. The aim of this second regression helps to determine the final premium. This is called (we take a lot of shortcuts) a Fama-McBeth regression, as it uses the framework of Fama and MacBeth (1973) to which we refer. The problem is that, alike with the CAPM, Size or HML do predict the returns, but the betas on those characteristics do not predict the returns (it does not work so well, indeed). The problem is still open.

Main Bibliography

• Fama and MacBeth (1973)

• Fama and French (1992, 1993, 1995, 1996)

• Fama and French (2008, 2012)

• Fama and French (2015, 2018)

3.2.5.2 Extensions

In fact there are more multi-factor models. Fama and French elaborated the mode famous one but there are others. Some years later after the work of Fama and French (1992), a third factor has been identified, the Momentum factor5 The momentum effect has been identified since long, with a major contribution of Jegadeesh and Titman (1993) in 1993. Carhart proposed to add the Momentum factor to the Fama French factor model through Carhart (1997). The Momentum effect is the most intuitive effect we can think of: we keep on buying the stocks whose recent returns are positive, and selling the stocks whose returns are negative. The generalization of this intuition is to think that future returns are linked to past ones and to buy the winners and sell the losers among peers, where the criterion s often approximated by a one-year return (lagged by one month). This effect is generally explained by behavioural biases of investors who are said to under-react to news or informations on winners. A clear risk explanation, is still missing as we will explain it in Section 5.1.2.

Carhart (1997) builds its portfolio in a very similar way than Fama and French, forming a WML portfolio (Winners Minus Losers, simply!) splitting along size (separating between Small and Big)) three buckets of Losers, Average or Winners computed along one-year performance:

1 WML  ˆR  R   ˆR  R ¥. 2 S,W B,W S,L B,L Momentum is in particular negatively correlated for one asset class on a zone, since they share an op- posite sensitivity with respect to liquidity risk. Momentum shows a negative link, whereas Value is positively linked to liquidity risk as it is explained by Asness et al. (2013). They show in particular that this negative correlation holds for equities but also for government bonds, indices, commodities or currencies. This correlation is believed to be mild in quiet conditions but pikes up in absolute value in

5We talk usually about momentum for stocks and trend is more often used for future directional strategies. Momentum needs a long-short investment on many assets whereas trend may be directional on one asset only.

38 bad times.

Fama and French kept on working on their model. In 1996, Fama and French (1996) identified that their original model was able to capture the effect of other factors’ candidate such as long-term reversal of De Bondt and Thaler (1987) or other Value-like factors (cash flow to price). But in 2015, they finally extended their model in Fama and French (2015) to end with a 5-factors model. The main finding is that by adding a Quality and an Investment factor (aka the CMA factor that stands for Conservative Minus Aggressive) but still rejecting a Momentum factor. CMA aka investment, is built by ranking firms that invest more aggressively or not. The definition of this last factor is quite fuzzy, but the idea is that higher investment level is related to lower expected returns. What is called investment is related to the growth of total assets. The easiest definition of “assets” could be total assets of the companies even if in the original paper, Fama and French try to find other proxies of this variable. Fama and French are reluctant to add Momentum mainly for risk reasons, as the risk of the anomaly changes so quickly that its strong speed of variation is at odds with the natural risk variation in equities in cross-section. The 5 factors were then the market, Size, Value, Profitability (aka Quality) and Investment. The 5-factors model out- performs the original three-factor model on all metrics and it generally outperforms other models”. But one key finding is that the role of Value (HML) appears as redundant and facultative, since excluding Value does not change much performance metrics of the model.

Of course, many competing factor models. We can cite the 6-factor model of AQR (2014) which in- cludes the market, Size, Value (HML), Profitability/Quality, Investment (CMA) and, contrary to the Fama-French 5 factor model, the Momentum factor. In 2015, Hou et al. (2014) implement a q-factor model where the market, Size, CMA and Profitability is sufficient to explain the cross section of average stock returns.

Main Bibliography

• Carhart (1997)

• Hou et al. (2014)

• AQR (2014)

3.3 Open questions

3.3.1 Factors, the minimal requirements At this point, we seize what should be a factor. It is (at least a fundamental factor) homogeneous to a portfolio of stocks defined in a transparent way that helps to explain in cross-section the returns of stocks seen as a whole. But this is in fact not sufficient. There are many more questions, and all do not have a clear answer. In fact, Cochrane (2011) asks explicitly the questions of interest to define factors, ques- tions that are still at the heart of academics’ works on equity factors up to now. Cochrane asks whether the so-called equity factors are independent, which one are crucial and able to move prices. Those are elementary yet fundamental questions, linked intrinsically to their factor nature, under the light of what has been described previously. Of course we won’t be able to answer precisely to all of those questions but we will try do do our best to have an overview of elements to answer to them.

Cochrane (2011) also uses an important term in the paper since the current research on factors is de-

39 scribed as being a factor zoo. The term is important because it describes well the profusion of research of new potential candidates. We will come back on it in particular in Section 6.1. The term factor zoo is also used quite often by many researchers studying this explosion of factor candidates. The word zoo refers both to the huge number of attempts, but also to the erratic way of justifying the factor nature of those candidates. The integrity of the research and the statistics used to back the potential discovery is often questionable. Driven by the appetite for publication on a trendy topic, many researchers are often light on the statistical foundations of their studies. So this opens the way to new questions to define factors and their robustness.

The question that what should be a factor is really important since it should allow to separate between the numerous candidates and ask sound theoretical questions. Dimson et al. (2017) gives an explicit def- inition: “A factor must be persistent over time, pervasive across markets, robust to different definitions, intuitive to common sense, and investable at reasonable cost”. Some researchers have tried to define the guidelines that allow to determine whether a given anomaly should be named as a factor or not. The work of Feng et al. (2019) is such an example. Ang (2014) defines four essential features for a factor to be qualified as so. Those features are quite pragmatic. First, a factor “should have an intellectual foun- dation, and be justified by academic research”. Second, and linked to the returns, a factor should “have exhibited significant premia that are expected to persist in the future”, which is quite natural. A third hurdle, which is not a feature that is often quoted by academics, is that factors’ candidates should “have return history available for bad times”. A last requirement, interestingly linked to costs and implemen- tation as detailed in Section 6.3, is that a factor should be “implementable in liquid, traded instruments”.

The paper of Fama and French (2018) is an other, since in it, Fama and French propose a sophisticated use of regressions for each factor candidate on other factors. They end up with a final model that is made of Size, Value, Quality, Investment and Momentum. We however deeply encourage our reader to have a look at the paper of Pukthuanthong and Roll (2015) which is quite specific, a bit high-level but really interesting on the topic. The authors try to answer to the ambitious goal to determine a procedure to answer whether a new factor candidate is really a factor. The main message is that factors should be related to the Principal Components Analysis of the returns of the stocks but that we can probably only see projections in the real world of true risk drivers that real, unobservable factors, are. The good aspect of this paper is that it asks fundamental questions and approaches the definition of what a factor should be. They clearly state that “factors movements should not be easily predictable” and that usual slow characteristics of stocks (firm size e.g.) should not be factors, but can be however related to returns, in case of those characteristics have some overlap or alignment with true factors (or because they are just arbitrage opportunities). They recall for instance that Daniel and Titman (1997) underline that Book to Market (a.k.a a usual implementation of the Value factor) is not a factor but may be however related to an unobservable factor, and is its projection in the real world. The authors recall also that contrary to the results of Fama and French (2018) and what is said before, Ilmanen et al. (2019) finds that Size is a factor, Value might be one and that Momentum is not. In the idea of Pukthuanthong and Roll (2015), a factor should be related to the eigenvectors of the PCA of the returns of stocks. But as PCA is only an estimation of sets of linear combinations, if it’s related to true underlying factors, we do not observe directly factors alone, yet a non-stationary combination that is a mix of risk drivers in general, that are made of non-diversifiable factors, diversifiable factors, and characteristics as industries, etc. A strong idea of the paper is that even if it’s not a factor, a rejected factor can still be interesting for investors if it has positive performance!

There are many questions on their performance. At the moment, let’s stick to the questions relative to a proper factor model. What are the candidates for potential new factors? A factor model should be parsimonious or not? We saw that those factors may be associated to a risk premium reading at some

40 point. We will question below the fine difference between a risk factor and an anomaly, but how impor- tant should be this performance? On which timescale should this performance be materialized? Is this performance long-standing and constant or is it intermittent? But how to monitor performance decrease, if any? There are of course many more questions when dealing with portfolio construction. How to combine factors within one portfolio? What are the diversification properties of those potential combi- nations? Is it more efficient to combine the factors in a long only or in a long-short portfolio? What are the transaction costs associated to those factors? Is it possible to time them and if yes, is it profitable? This makes a lot of questions! At least, a constructive approach should be the one of Beck et al. (2016). In this paper, the authors underline that an universal factor definition should ask for three fundamental questions. Factors should be justified by a long stream of academic literature. They should be robust to variations in their definitions and be performing for many countries.

Take-Home Message

The common features of a factor is that a factor must be persistent, robust, universal, and backed by some theory. However, what we could identify (economically or statistically) is only the projection on the observable/estimated variables of unobservable, true risk drivers.

3.3.2 So, what is the message? The financial investment industry has long been aware that factors have an influence over individual assets’ returns. What is important to understand for now is that as the number of assets in a portfolio P grows, the return on Rt will be more and more explained by factors, and that the influence of factors will become rapidly more important than that of individual assets. Malevergne and Sornette (2004) show that the existence and the appearance of factors is in fact the “result from a collective effect of the assets”: according to this view, factors are then both a “cause” and a “consequence” of asset correlation.

The essential lesson from diversification is that investors will not be rewarded for their idiosyncratic risk taking, only for their systematic risk exposure. It simply means that when the number of assets in a 2 P portfolio grows, the regression’s R obtained with Equation (3.2) when replacing Ri,t by Rt should be increasing, and that it should be above 50% as systematic risk should explain more of the cross-section of returns that individual stocks’ idiosyncrasies.

We begin to explore the meaning of terms like alpha, beta and factors without a proper definition. Let’s stick here to the interpretation or their semantic use in the financial industry in this chapter. Even without a strong theoretical definition, it is important to keep in mind this simple statement: factors should be scalable, whereas alpha is not. The underlying notion is capacity. How much money can I load on my strategy? On factors, this should be huge. More inflows of money should not destroy the nature of the factor strategy. Whereas “niche” strategies (alpha) generating extra performance should not be scalable. Risk factors must be in line with the notion of diversification. This is in itself a first element of intuitive definition! As clearly expressed by Roncalli (2017), Beta should be associated to diversification and rule-based management. Alpha should be associated to concentration and discretionary asset picking.

3.4 Factors and the financial industry

We saw that the seminal message of CAPM was that the stock returns are primarily driven by market risk that cannot be hedged away by diversification when invested in long positions. The premium earned by holding the market portfolio is the equity premium. Yet the fraction of risk that the market explains

41 has declined since approximately 10 years. The ex-market risks were believed to represent only 4% of returns’ variance in 2012, but up to 10% nowadays. Investors should now be more concerned by ex-market risk and that’s what they do. Since the launch of the first fundamental factor weighted ETF in May 2010 (see Fuhr and Kelly (2011)) a tremendous number of products have been launched, backed by factor theory. And one may meet a tremendous quantity of terms to qualify those factors when it turns to their use in practice for investment, namely: factor investing. Instead of factors we may find style premia, smart beta, prices risk factor or anomalies. And of course factor investing. What’s all behind this? Clarifying the notions and the terms precisely is one aim of this section. We believe that for it, the best reference available is the following one: Roncalli (2017) on which we will heavily rely in the following.

ABC Vocabulary

Just to illustrate the variety of terms met in the industry, MSCI uses the term Risk Premia Indices to market its indices. Conversely, MorningStar employs Strategic Beta for equivalent products, where Russel uses Smart Beta and SPDR Advanced Beta. Finally, the scientific consulting school EDHEC monitor the performance of Smart Factor Indices. This proves that all of this needs some clarification..

3.4.1 ARP or Smart Beta? Long Only or Long-Short ? We said previously that the market portfolio represents a market risk that cannot be diversified away. This is true only when we take positive positions in the investment portfolio. It is of course possible to use shorting (i.e. selling-short stocks) that can be conceived as a portfolio with negative positions. Shorting is only allowed to sophisticated investors and is generally not possible for retail investors. Shorting comes at a cost, with other kind of risks, but allows to produce portfolios where we can hedge the market, sector exposure, etc. So the question of how to implement optimally those equity factors is one more: should we use a long only, or a long-short approach? Of course, it is not only a question of optimality but also of possibility for the final investor. What is interesting here is the wording that is associated to one or the other approach. Here again, the paper of Hamdan et al. (2016) demonstrates an extraordinary clarity on the topic.

Smart Beta: this term refers to the long only implementation of factors. This implementation implies usually no leverage (i.e. no lending and full investment) and is mainly offered through mutual funds or ETFs. Of course, a long only implementation implies that the portfolio construction should be cautious unless a strong correlation to the market is easily recovered. Let’s recall something important here: the market portfolio is made of positive positions; often a style factor is made of relative bets; so a theoreti- cal factor should end with positive and negative coefficients on the various stocks of the universe. This factor is used to form a long only portfolio so the decision rule to be applied is not trivial. Should we skip the stocks with a negative weight?

Should we attribute them a weight in the end? This is not trivial and explains why the diversity in Smart Beta and ETFs on a same factor comes with various implementation rules, risk budgeting methods or al- ternative weightings. Factor investing, technically, is more often related to the long only implementation. However, the term Smart Beta recovers no official definition even if the context is not precised, this has to be understood as a long only implementation. Smart Beta ca be viewed also as competing weighting schemes that allows to deliver a long only portfolio different from the traditional market-capitalisation weighted indices. That’s what Rob Arnott states in Arnott et al. (2016), underlining the fact that the words Smart Beta are now widely used, not only to refer to passive strategies, but also to refer to any

42 systematic or automated strategy.

Alternative Risk Premia (ARP): this term refers to the long-short implementation of factors. As re- called by Roncalli (2017), ARP are universal and are not confined to equities, they can be found in all asset classes (from rates to credit, from currencies to commodities - Carry and Momentum being the ARP acknowledged to be present in each one), correspond to long-short portfolios in opposition to “tra- ditional” long only exposure within the corresponding asset class. A theoretically defined factor with negative coefficients is then quite close to is natural implementation in a long-short ARP. Naturally, the term alternative comes in opposition to the traditional risk premium, the equity (market) risk premium. The subtlety is that it may be simply anomalies. We will come back later in Chapter 5 on the difference between risk premia and anomalies but the semantic intuition is clear.

A risk premium is the reward for holding a portfolio that represents an identified risk. But the associ- ated risk of an anomaly is more difficult to highlight. Yet with the profusion of claimed discoveries of so-called factors, identifying the source of the existence of this factor and the rationale behind it is of utmost importance. Of course, the implicit hurdle for a factor to be proposed is that it delivers, on the long term, a positive performance. The last words should be those of Blin et al. (2017b) that recall that a true risk premium should deliver positive excess returns on the long term, and should be backed by either by an economic or by a behavioural rationale. What we expect from ARP them is that being neu- tralized from the market premium, thanks to the long-short implementation, ARP should bring a more direct access to the underlying risk premium, and so a decreased correlation to traditional investments. They should therefore bring another dimension to the diversification of classical portfolios, even in the multi-asset case. The last thing that we should expect from an ARP in this respect is therefore to be investable and scalable (understood: after trading costs). If this property is not met, the aforementioned properties of diversification would be pointless since unrealistic for any portfolio. However, as recalled by Blin et al. (2017b) they cannot be understood however as pure strategies, with tremen- dous risk-adjusted risk-return profiles.

ABC Vocabulary

There are many terms and marketing semantics. Smart Beta means mainly “Long Only” and Alternative Risk Premia means essentially “Long-Short”.

ABC Vocabulary

The returns earned through the investment in a factor should be the compensation for an identified for a risk that the investor supports. Otherwise, it is an anomaly.

Main Bibliography

• Hamdan et al. (2016)

• Roncalli (2017)

• Blin et al. (2017b)

43 3.4.2 How does the financial industry use those ideas? We already saw in the introduction that a whole stream of business has been developed around the idea of factor but that the distinction between alpha and beta is now widely blurred and factors are a common lens to monitor the strategies of active managers. As recalled in Hamdan et al. (2016), ARP represent now a vast proportion of what is called active management, even if there are still “significant differences between ARP and hedge fund strategies”, not only because of implementation divergences. The implementation of ARP through indices is now plain vanilla, with the main advantage of being transparent, easy and systematic. Hamdan et al. (2016) finds indeed that the “main drivers of the hedge fund industry are long exposure on developed market equities, long exposure on high yield credit and a subset of ARP.”. Hedge funds still keep an edge since in spite of this, a part of Hedge Fund returns remain unexplained with either traditional or alternative risk premia.

Going back to ARP, investors have to make a choice between investing in an index playing an ARP, or investing in an active portfolio implementing a risk premium. In practice, it is not easy, since (Hamdan et al. (2016) again) those two options are not perfectly equivalent and the way they differ depends on the very nature of the underlying ARP or on the implementation (long only or long-short). The advantage of active management is that the fund manager has greater possibilities in terms of portfolio construction, premia blending, weighting, costs, to take some distance with respect to the benchmark. Of course, if all that matters are management fees for a vanilla exposure to a very well-known premia, indices have many advantages. This underlines again the fierce of the debate between passive and active management.

3.4.3 The importance of portfolio construction We already saw the importance of underlying the difference between a long-short and a long only imple- mentation of factors. But this is more or less related to the needs of the final investor, interested either in the pure bet on the ARP (long-short) or an exposition to the traditional one, tilted towards a factor (Smart Beta). However, whatever the choice, the portfolio construction (i.e. the true implementation of the factor) plays a tremendous role in the final expression of the factor. Tackling this problem in this short course is quite impossible and would demand long developments. Indeed, factors do interact between them and this correlation has to be taken into account. We will for instance illustrate through Chapter 5 that a lazy control of the residual of the residual beta (with respect to the market) has impor- tant consequences. Moreover, the control of the risk of the final portfolio is an other important topic. Generally speaking, there are two aspects. The first aspect of the problem is related to the way the equity portfolios are built, and what are the final positions in dollars in each individual stock that are determined in the final portfolio. The second one is more on the returns’ distribution properties generated by those portfolios (factor by factor) and how they can be combined between them within the same asset class, or across asset classes. And this of course, without speaking about the implementation costs and the handling of the potential impact.

The real difficulty of the first aspect is to find information made public since each fund manager has more or less her own techniques to allocate a given signal. The second aspect that encompasses the study of empirical factor returns’ distribution is now well mastered and the literature is quite developed on this point. Indeed, in traditional theoretic models, either traditional and alternative risk premia are well connected since they are conceived to be a reward for the risk that an investor may take, this risk having to be defined but being more or less related to market drawdowns. This means that condition- ally on bad market states, those risk premia should be somewhat correlated. What we will illustrate in Section 5.2 is that volatility-adjusted returns of risk premia should exhibit a large third order moment (skewness) and experiencing sometimes large drawdowns. This characteristic will be, as we will see, a

44 feature helping to identify risk premia with respect to anomalies.

Main Bibliography

General references on portfolio allocation may be found in the book of Grinold and Kahn (2000). For references more related to factors, one may refer to Israel et al. (2017), Lezmi et al. (2018) or Meucci et al. (2015). Let’s note also that the reading of the book of Roncalli (2013) may be very profitable for all topics related to risk control and risk budgeting. Finally, complementary readings and discussions on specific factor-beta management may be found in Huij et al. (2014) and in Ciliberti et al. (2019).

3.4.4 The performance puzzle When studying risk premia, a natural puzzle appear, strongly related to portfolio construction: the diag- nosis of their performance. As precised in Hamdan et al. (2016), one main difference between traditional and alternative risk premia is that it is very difficult to identify their performance that is “dependent on several parameters, such as the asset universe, scoring method, weighting approach and trading im- plementation” which explains the profusion of factor discovery attempts, that are more or less papers where academics compare their various backtests. For traditional rislk premia (like the equity premium for instance), its performance is quite easy to discuss: the traditional market-capitalisation weighted indices are available whatever the geography and the results do confirm their existence, for a long time now (see the introduction). But in general factor strategies have very different return distribution fea- tures, that really depend on the strategy as we will see in Section 5.2. The final investors should in fact know beforehand what they need, whether they want to invest in a factor for its absolute return, for its diversification properties, for its decorrelation in bad times or in the tails, etc. This is really depending on the investment objective, conditionally on what the investor already holds.

Let’s take the example of the Momentum factor. This point is heavily discussed in the cornerstone paper of Daniel and Moskowitz (2016) which we refer to. In this reference, the authors studied in detail the performance of the Momentum factor returns’ distribution. The factor is known for having the simplest definition and corresponds probably to a behavioural bias. In spite of its simplicity (buy the winning stocks, sell the losing ones) the pattern of the return distribution is disturbing. The average return of the factor is high, but “there are relatively long periods over which momentum experiences severe losses or crashes”. Moreover, the authors show that the return distribution is asymmetric in the sense that the returns earned by the long or the short leg of a Momentum portfolio are very different.

3.4.5 Factors or anomalies? The landscape seems quite clear at this point. A Risk Factor are the systematic portfolios that explain the return variance in cross-section. A Risk Premium is the compensation earned by an investor for holding a portfolio homogeneous to a risk factor and accounting for a non-diversifiable risk. This compensa- tion is a positive average return even if the amplitude of this return, and the horizon of the realization of this return may greatly vary. As a consequence, a Market Anomaly is a portfolio which exhibits a positive return that is not a risk premium (no risk interpretation). Cochrane (1999) discussed already the difference between risk premia and (potentially arbitrage-based) anomalies. In the case of a risk premium, investors may not trade it, then contributing to the persistence to the high reward associated to it. Alternatively, any easy arbitrage should be traded as soon as it is detected, making the anomaly

45 to disappear. Lo (2004, 2016) propose to base the semantic difference on a sustainability criterion. If a “sustainable risk premium may be available to investors for a period of time, given the financial en- vironment” a “behavioural bias premium may be sustainable” in a more permanent way. Uncovering premia or anomalies is satisfying when we find a semantic interpretation that allows to tell a story to identify investment patterns and help to interpret sources of performance. Factor identification should originally based on their risk explanation power, but the success they encounter in practice or in the literature is paradoxically also linked on their link with performance and their potential interpretability. And we already gave above such examples. Both risk premia and anomalies fall under the umbrella of Alternative Risk Premia as discussed before.

The most interesting observation in this semantic split is to link this separation to their return distribu- tion. Is it yet possible to build some simple intuition with a given “factor” to assess whether it is a risk premium or a behavioural anomaly? We recall here the framework used in Lemperi´ ere` et al. (2015) where insights are proposed in order to differentiate risk premia from market anomalies. The main idea is that a risk premium is probably more related to skewness than to volatility, as a reward for tail risk. Why? Because as detailed in the paper, positive expected returns attract potential investors, increase prices but may also lead to crowdedness, then to a greater risk of crash, consequently materialized by a greater downside tail risk. Crowdedness may also reduce capacity and future returns. The anomaly generally finishes to “stabilize (...) around an acceptable skewness/excess return trade-off.” The authors provide a general definition of what a risk premium is in terms of skewness and compensates for neg- ative extreme events rather than for the “simple” volatility, the relation between effective Sharpe ratio and skewness being quite linear.

This pattern is however quite discussed. Indeed, Risk Premia “almost unanimously exhibit excess kur- tosis” (Blin et al. (2017a)). For the fourth moment, there is not much discussion. But the stylized facts around skewness (third moment) are more subject to debate (see e.g. Koijen et al. (2018)). The fact that risk premia are associated with high absolute skewness is more or less accepted. The fact that this skew- ness is negative is supported by Roncalli (2017), that even talks about skewness risk premia for denoting non-market anomalies. The argument is that risk premia are in general able to generate positive returns, whereas when the risk they are compensating materializes, crashes occur with large but rare drawdowns in bad times. This generates heavy left-tailed return distributions, then negative skewness and high kur- tosis. Market anomalies have in general lower returns, but are less prone to large drawdowns (even if the crashes of Momentum may be spectacular, see Daniel and Moskowitz (2016)). Blin et al. (2017a) in contrast, finds that even if risk premia are a compensation for extreme risks, the occurrence of negative skewness is not compulsory and in this respect, departs from the findings of Lemperi´ ere` et al. (2015). Finally, using ARP may reduce volatility of a portfolio through diversification but the tail risk may be increased due to their particular profile.

Remark 3.6 In practice, the idea of Lemperi´ ere` et al. (2015) is simple: with a series of observed returns ˆrtt 0,...,T of returns, order them by their absolute value in increasing order, and plot the cumulated T sum of signed returns (the final cumulated return of the series should therefore be equal to Pt 0 rt. Any factor qualifying as a risk premium, associated to negative tail-risk, within such a graph, a very specific behaviour for the highest absolute returns that appear to mostly negative in the area of the graph rela- tive to the most extreme absolute returns.

Remark 3.7 More recently, Cho (2019) proposed a recent model where arbitrageurs, loading on com- mon equity anomalies, turn them into factors.

46 Chapter 4

Intermezzo : Backtesting

We need at this point to have a word on what is a backtest. We already talked about the performance puzzle of equity factors and the potential hazards of data mining in Sections 6.1 and 6.2. In Chapter 5 we will review factors one by one. We make a little break here to tackle a more general problem which is backtesting. Backtesting is a general concept that applies whatever the sophistication of the factor, the strategy or the anomaly. It is just about having a proper procedure and attitude to deal with financial returns’ time-series.

We call backtest a simulated strategy on historical data applied on fictional money. Such an approach is quite universal and helps to determine if a strategy is statistically significant. If not, no need to trade it. Backtesting may be difficult for two reasons. First, one should identify the statistical objects at play. All the performance metrics are statistical variables, and for each strategy we observe one and only realisa- tion on a given set of data. The second reason is more subtle: it is tightly linked with the motivations of the strategy designer. It is tempting to go live with a strategy which works well in simulation. One needs in practice to look at all the potential drivers of performance: do we have enough data? Is there any efficient risk management? The aim of this chapter is to provide insights in order to assess if a given strategy is trustful enough to be traded for real. One could be able at the end of the chapter to build progressively a quantitative examination of the strategy in order to assess if:

Y the (potential) simulation framework of the strategy has no bias;

Y what are the key features of the strategy seen as a statistical time series;

Y the strategy is really performing in the sense that it is statistically significant;

Y it is possible compute precisely the main performance measures of interest, potentially annualized;

Notations

We will note ˆRt a time series of random financial returns. The unconditional distribution of returns through time will be denoted R and ˆrtt> 0;T ¥ the observed time series of returns. ˆRt is not asked to be stationary but we assume that its moments are constant in time. We will note ˆPt a time series of observed quantities where stationarity is not expected or has no sense: prices, cumulated or integrated returns, etc. ˆRt will have to be understood throughout this chapter as a time series of one-dimensional random return, representing the historical (after costs) return of a fund, strategy or portfolio, either ob- served, reported or simulated. The nature of the return (arithmetic or geometric) depends on the context and is in general precised. In this respect, rf may still be considered as a risk-free rate.

47 4.1 Backtesting

4.1.1 Providing accurate backtests First, we have to define ther term backtest. A backtest is the use of historical market data to evaluate now, how a given strategy has, or would have performed, in the past. The most natural way to create a backtest is to recreate the strategy on test in the past, to assess its significance. This exercise is suffers many biases (see below) but it’s the only way for a practitioner to test her or his strategy on “real” (but not live) data. This highlights a difference between finance and experimental sciences: it is not pos- sible in finance to re-do several tries with a free parameter, all others being fixed. The true difference between finance and other fields is the repeatability of the experiment. It is impossible at two different dates to face identical market conditions, moreover in an environment that also changes technologically: repeatability in the conditions of the experiment will never be met. Backtests then are indeed imperfect approximations, but practitioners are forced to deal with fixed past data and to evaluate their strategies on the past.

However, we only see in the end winning financial strategies. It is quite rare to discover a professional presentation that promotes losing strategies? Any manager willing to attract new investors will try to sell strategies with a realized benefit. We often see however those presentation with cautionary messages like “past performance is not the guarantee of future results” or “past returns are not indicative of future performance”. It is not because it is difficult to provide a statistically meaningful backtest that anyone is allowed to do everything! The Security and Exchange Commission (SEC) cautiously tries to monitor funds in order to check that they do not promote investments that are build on the basis of false, wrong or caution-less information and processes. The SEC is even concerned by performance promotion that is even just ambiguous enough, but do not necessarily step into prosecution in that case. They deliver in such situation no-action letters that are very insightful (see Lo (2016)). It is important to keep in mind that, for statistical and legal reasons, it is crucial to produce serious and coherent backtests including costs, fees, dividends and taxes with a precise modelling of potential losses.

The aforementioned forewords are required for regulatory matters but in statistical terms, we should be worried of seeing only positive past-performing strategies! If past performance is not the guarantee of future benefits and is uncorrelated with future ones, we should be worried of not seeing more frequently losing strategies! It is quite impossible to assess the future profitability of a given strategy. At best, we can observe past performance of the strategy; we can understand why it seems to work and try to monitor the performance statistics and the portfolio analytics.

4.1.2 In/Out-Of-Sample One of the main risk that finance practitioners may face is over-fitting. Using the whole set of data at hand, it is tempting to optimize over and over the parameters driving the strategy, in order to improve its backtested performance. This will give disappointing results when used live afterwards. A simple sanity check is to split the dataset into two parts, a part for learning and a part for evaluation. The learning sample, used to optimize the parameters of the strategy, when used to apply the strategy, will deliver figures and performances that are called in-sample. Applying the strategy on the second dataset, the one dedicated to evaluation, will provide out-of-sample performance figures. There are alternative ways to split datasets. One may for instance learn on one geographical zone and test the obtained strategy on the other zones. One could also test a strategy on one asset class and backtest on other asset classes. Also one can collect more data in the past and to extend, afterwards, the size of the datatest. The most common approach is to learn on past data and test on the more recent data. The out-of-sample is then the more recent part of the data. The out-of-sample backtest on past data is always a pseudo-out-of-sample

48 test. Observed past data has been affected or generated by past trades and other participants. Fictive trades in the past would have had an effect that is, by definition, unobserved. This is the reason why there is often a degradation between backtested performance and effective performance. Therefore, a simple strategy with a small but decent performance may be preferable to a strategy with promising results that has not been tested live. This explains why investors trust funds that are live for sufficiently long, rather than funds with short existence but impressive backtests.

Cross-validation Cross-validation is a way to assess the quality of estimation and is related to prediction purposes. Ideally, it would be great to have more datasets to evaluate the validity of estimated parameter in order to assess that the estimated model has nice properties when applied on unseen data. With a set of inputs X that are used to predict outputs Y , the strategy designer has estimated a function f : θˆ1 is a parameter θˆ1 that drives f and which has been estimated on an observation set S1 of variables XˆS1. With a new set of data S2 of ˆX,Y  pairs, one hopes that f ˆXˆS2 is sufficiently close to Y ˆS2. In that case, θˆ1 one is ensured that the final modelling is quite able to generalize and that over-fitting will be avoided. In this respect, rolling regression may appear as a form of crude and basic cross-validation since one estimates a model on rolling data. However, it is not strictly speaking an example of cross-validation since the estimation method does not change and the behaviour on the new dataset provides no feedback to the estimation method. A true cross-validation procedure should involve in fact a partitioning of the initial dataset, where each partition is made of subsets with no overlap. One training set (in-sample) and one set for validation (out-of-sample), whose union may or may not represent all the data. Partitioning again and again, validation sets (resp. training sets) may of course share data in common with previous partitions. The usual practice is of course to use the results obtained on each partition to give a feedback on estimation, for instance by averaging with an appropriate weighting rule the parameters obtained with the various partitions. This approach allows to reduce the out-of-sample error of the model and increases its forecast abilities in situations where data is scarce. There are of course many ways to perform cross- validation (exhaustive or not, repeated, n-fold, leave-n-out, etc.) and we refer to Hastie et al. (2009).

4.1.3 Biases Biases are implicit, hidden or unseen and the main drawback is that they are always there! Some biases are really more insidious and their exhaustive description is in general impossible. We will try to list here some of the most common biases that one should check on a backtesting procedure. A pre-built procedure to handle them does not exist and sometimes, being simply conservative is more adapted than to seek for predetermined solutions.

4.1.3.1 So much biases! The introduction of future information is a common yet spurious bias to systematically check. In some cases, future information is easy to notice since it generates a tremendous and unrealistic performance (Sharpe ratio of 20.0 for instance) in the backtest but it is not always the case. In those situations, it is generally easy to check for the unwilling introduction of future information. Dealing with past data at date t, an example of the use of future information would be that the trading signal at date t  2 includes information of date t  1 (that would not have been truly available at date t  1). The term used in this case by data providers is that the data they provide is point-in-time. Point-in-time means that the historical data used now as past data was really available in the past at the date and time indicated. A simple example: say we deal with a strategy executed once a day, using daily refreshed data. The hour of

49 refresh of the data in the past is necessary, since if the data provider was not able, in the past, to deliver the file before the daily decision, the strategy may lose all its interest. Then the hour of the delivery of the data in the past drives the credibility of the backtest.

Survivorship bias is another kind of bias. This may be conceived as an unwanted use of future informa- tion in simulations. Let us assume that one is testing a stock selection strategy on a pool of stocks, with a goal to select stocks for a backtest for a portfolio construction exercise. If one only considers stocks traded today, and even stocks part of an index, one will have potentially a very interesting strategy... since one will be investing in stocks that are known to have performed sufficiently well in the past, since the companies they represent have survived (or performed sufficiently well to enter the index composi- tion)! The solution involves mainly infrastructure and a work on data. It necessitates to build dynamic investment pools, with living, dying or badly performing stocks included. Stocks have to be considered dynamically inside this pool in order to get a clear and exhaustive picture of the real potential of the strategy.

One may face operational biases that all depend on the particular infrastructure of the practitioner. A straightforward example is the storing of orders and the modelling of execution costs. Another op- erational bias is linked to the way work is organized. It does not impact directly the estimated gain, but keeping a track of all the backtests and attempts that do not work is rarely done, even if it should be. Keeping all the results of strategies that have worked or not will help to understand any intriguing behaviour of a future strategy in test.

A final bias is the overconfidence in... computers! Remember that not so long ago, even in the 80s, com- puters were not able to calculate so fast, and that processing information was slow and costly. A backtest on arbitrary past data is not always relevant and reminds that technological possibilities of today do not account for the lack of speed of the access to information some decades ago. This is rather a cautionary message on the use of technology than the description of a bias.

4.1.3.2 The “in-sample trap” Leinweber (2007) presents an unusual exercise: can we explain the variations of the S&P500 index by trying a sufficiently high number of variables, potentially unrelated to financial information? The paper became famous since the author postulates a provoking message: we can explain the variations of the index (in a very satisfying statistical fashion) with Bangladesh butter production, US cheese production and US plus Bangladesh sheep population! The example is funny but highlights in fact a very deep concept. On a fixed universe and with an increasing number of tests, it will be always possible to find artificial strategies, parameters, variables that allow to get positive performance on a given dataset. Such an approach is called data-snooping. Sullivan et al. (1999) defines data-snooping in the following way: “Data-snooping occurs when a given set of data is used more than once for purposes of inference or model selection. When such data reuse occurs, there is always the possibility that any satisfactory re- sults obtained may simply be due to chance rather than to any merit inherent in the method yielding the results.”. The biases due to data-snooping are very well known and often highlighted in financial statistical stud- ies, rather that in financial textbooks. Historically, biases due to data-snooping have been previously modelled and quantified by e.g. Lo and MacKinlay (1990) or Diebold (2006).

The general concept that encompasses data-snooping is statistical overfitting. Statistical overfitting deals with the elaboration of a model or a strategy “that owns a level of complexity which is higher than the reality has” (with the words of Bailey et al. (2015)). A model that is overfitted is typically not

50 parsimonious and has a very low reproducibility. An overfitted model is useless in the sense that it is only able to describe the training data. If one tries a lot of different versions of the same backtest, one will often find a version of this backtest that works best. It is always tempting to retain the strategy and trade it for real. One may be very disappointed by its out-of sample performance. In order to figure out the difficulty of being discriminant with one’s own results we could recall the heuristics given by Bailey et al. (2015) on random strategies. Building random signals, they give an interesting empirical rule: with 5 years of daily data, with roughly 45 trials, the best strategy one will select in this experiment will have a Sharpe of roughly 1.0 with a high probability. Another disturbing feature is that overfitted models have the unwanted behaviour to generate out-of-sample negative performance and losses rather than zero-performance!

4.2 Performance statistics

The previous section described the potential biases of backtesting yet we are still needing a definition of performance measures. It is important is to consider the generic nature of what is a strategy: a time series of returns. We want to highlight the statistical nature of the performance measures, that are statistical estimators with their own specificities. We will rather study some of them precisely, rather than trying to list every possible one. Indeed, our aim is then not to compare the metrics between them, or to promote one rather than another in itself.

Numerical illustration We will illustrate the concepts presented in this chapter with the study of two fictive strategies, that are all anonymous and paper-trading exercises.

• The first one, called “Strategy 1”, illustrated in Figure 4.1, runs from 2011/01/01 to 2014/12/31.

• The second one, “Strategy 2”, illustrated in Figure 4.2, runs from 2013/01/01 to 2014/12/31.

The only things to know is that both strategies represent the cumulated gain, with no reinvestment. For Figures 4.1 and 4.2, the plain line is the total return (cumulated sum) and the dashed line on each figure represents the zero-volatility projection of profit if one assumes that the daily return is equal to the mean over the sample.

4.2.1 Sharpe ratio To assess the profitability of a given strategy, one of the most widespread metrics is the Sharpe ratio. Its initial name, as it is originally presented in Sharpe (1966), is the reward-to-variability ratio. It rapidly gained in practice the name of its initial inventor. The Sharpe ratio is the ratio of the effective return of the strategy scaled by its volatility. The rationale of its use and success is twofold. First, let us observe that there is no natural total order in R2. This means that for any given couple of real numbers, it is not guaranteed to be able to compare it with another couple of real numbers, if no additional criterion is involved. In financial terms, if one is given a set of assets it is impossible to sort them along with their couple of first moments of their return distribution, all other things equal, without a defined rule. If ˆµ1, σ1 and ˆµ2, σ2 are the return and volatility of two assets 1 and 2, we are not certain to be able to order them. We would prefer asset 1 to asset 2 if it provides more gain with less risk (µ1 A µ2 and σ2 A σ1) but how to compare two assets such that µ1 A µ2 and σ1 A σ2? The Sharpe ratio does not

51 Figure 4.1: Strategy 1 - 2011/01/01 to 2014/12/31. Plain line: cumulated return with no reinvestment. Dashed line: drift of the strategy. The Maximum Drawdown period is also indicated along with moments of the daily return distribution. solve this, however it aggregates the two concepts in one scalar criterion, as it increases with return and decreases in volatility. The second idea behind the use of the Sharpe ratio is that it is directly rooted in Sharpe’s theory of CAPM. Indeed, in the (volatility  return) map, the Sharpe ratio of an underlying as- set is the slope of the line that joins the asset with the origin (or with the intercept as soon as we include a risk-free asset in the computation). Maximizing the Sharpe ratio is equivalent to the maximization of the slope, that should be ultimately dominated by the slope of the security market line that joins the tangency optimal portfolio with the origin (or the intercept). It is therefore adapted to an “efficiency” reading in the perspective of finding efficient portfolios.

Even if the Sharpe ratio does not tell everything, we dedicate a whole part of the chapter to its study since it is quite always monitored, published, compared. But as simple to compute it may appear, it relies on some assumptions that are most of the time forgotten or simply never explored. Let us assume that we deal with a strategy that provides a return series ˆRt on 0; T ¥, a T -length period where Rt1 2 is the return between t and t  1. If µ E Rt¥ and σ V Rt¥, the theoretical expression of the Sharpe ratio SR as a random variable is defined as : µ  r SR f , σ with rf a fixed constant rate. In the following we will assume that rf 0 for sake of simplicity and notations, however it does not change anything in the computations. If we take a Brownian model for the diffusion of the returns of the strategy, the Sharpe ratio is the ratio of the drift of the diffusion scaled by its diffusion coefficient (volatility). To get a precise estimation of the Sharpe ratio, we should plug into its definition the values of the diffusion parameters (drift and volatility). It would be equivalent to assume that the returns of the strategy are independent and identically distributed with theoretical mean

52 Figure 4.2: Strategy 3 - 2013/01/01 to 2014/12/31. Plain line: cumulated return with no reinvestment. Dashed line: drift of the strategy. The Maximum Drawdown period is also indicated along with moments of the daily return distribution.

µ and variance σ2. Even if this i.i.d. assumption is very unlikely to hold in practice, the heuristics of the use of Sharpe ratio by practitioners are often implicitly related to this framework. Of course, the moments of the return distribution are never available, and the most natural way to compute it is empirically to replace the estimators by the sample counterparts of the return moments, that is to say :

T T 1 2 1 ˆ  2 à à µˆT µˆT Q rt and σˆT Q rt µˆT and SR SRT . T t 1 T t 1 σˆT

Numerical Example 4.1 We compute in Table 4.1 the moments of the distribution of daily returns of Strategies 1 and 2 and 3, whose cumulated gain is represented in Figures 4.1, and 4.2. In particular µˆT is the drift of each curve of cumulated performance if seen as a random walk and the slope of the dashed line in each figure 4.1, 4.2. Strategy 2 has a mean return that is up to ten times fewer than Strategy 1 but with a reduced volatility, resulting in an equivalent daily Sharpe ratio that is only half of Strategy 1.

à ˆd Strategy Nb of days T Mean Daily Returns µˆT Daily Volatility σˆT Daily SR Strategy 1 1043 8.95e  05 8.22e  04 0.109 Strategy 2 522 5.03e  06 7.71e  05 0.065

Table 4.1: Daily moments of Strategies 1 & 2.

53 Time scale and aggregation A crucial observation is that any measure of the Sharpe ratio remains an estimator and therefore a sta- tistical object, computed as the ratio of the two estimators of the expected return and standard error of the same random return. But we have to observe that the main random variable under study is the one- period return of the strategy. Then, it is important to understand that any property, feature, test, based on the Sharpe ratio depends on the assumptions made on the returns and on the time scale of the observed realizations (i.e. the unit of the time measure between t and t  1).

When looking at a backtest, one needs to have comparable figures. Practitioners are interested in annu- alized Sharpe ratios. But the annualization procedure depends both on the properties of the return time series and on the granularity or frequency of the series (daily, weekly, monthly, if the time between t and t  1 is day, week or month). Under the former definition, we are in fact concerned by a normalized definition of the Sharpe ratio. Another possibility is to define the Sharpe ratio under a non-normalized version which is extensive with time. In this case, the drift is replaced by the total return over 0; T ¥ and the volatility by the total volatility on the wholeº period. Under the i.i.d. assumption, the total return should be replaced by µT and the volatility by σ T , the Sharpe ratio becoming: º µˆ T º SRà T SRà  T. (4.1) ext σˆ º T The T -factor is related both to the length of the period but also to the data on which the Sharpe is measured. Indeed, one could measure a Sharpe ratio with daily data but express it at a different fre- quency. Let us assume that one deals with daily data on three years, with 750 points of data. The daily ˆd Sharpe ratio would be estimated on 750 points, giving a figure SRà . The extensive Sharpe ratio on ˆ  º à d  the total period would be approximately equal to SRext 750. However, if one deals with a Sharpe à ˆa ratio SRº , estimatedº on the 750 points of daily data, but already annualized, the multiplying factor is not 750 but 3, as 3 is the number of years of the period for a Sharpe ratio measured in years. ˆm º Dealing with a monthly Sharpe ratio SRà , the multiplying factor should be 36. This will be related to annualization (see Section 4.3.1) since annualization is already an aggregation in time.

Numerical Example 4.2 We can compare the aggregation effect of the first moments of the return distribution of Strategies 1 and 2 to the daily figures given in Table 4.1. The drift (mean return) of each strategy is positive, and the mean and volatility of the returns are extensive with the scale (month, year) of the measure. With a monthly aggregation, we obtain Table 4.2. With an annual aggregation, we obtain Table 4.3. The obtained Sharpe ratio of Table 4.3 have now values whose order of magnitude is maybe more familiar to professional investors. Figures for Strategy 2 for yearly aggregation are (volun- tarily) clearly flawed since computing a mean and a standard deviation on 2 points is meaningless: the yearly Sharpe ratio of Strategy 2 is very appealing but in fact totally pointless. Using yearly aggregated calendar data for Strategy 1 is hazardous, but for Strategy 2 it has no sense at all.

à ˆm Strategy Nb of months M Mean Monthly Returns µˆM Monthly Volatility σˆM Monthly SR Strategy 1 48 1.95e  03 4.14e  03 0.469 Strategy 2 24 1.094e  4 2.65e  04 0.412

Table 4.2: Monthly moments of Strategies 1 & 2.

54 Link with t-statistic

Why is this important? In fact, the multiplying term in Equation (4.1) gives a whole statistical sense à to this measure. Indeed, SRext is the expression of the Student t-statistic used for the test of ˆRt be- ing issued (or not) of the draws of a random variable with zero mean. The interest is straightforward since computing the Sharpe ratio in its extensive version encompasses the test of the significance itself of the strategy under study! With i.i.d. Gaussian returns, the distribution at finite distance T is a (bi- ased1) Student t-distribution with T  1 degrees of freedom. In the following, we still assume without loss of generality that rf is equal to zero but we also drop the T indexation of the estimatorsº of µ. à SRext follows a non-central t-distribution since the non-centrality parameter of the law is T ˆµ~σ. The statistical subtlety here is that those parameters are not observed and define the theoretical, unob- served centrality parameters of the law that will help us to test the significanceº of the strategy. Should µ be equal to zero, the t-distribution would be central. In other fields, the T ˆµ~σ is often called the signal-to-noise ratio which quantifies the proportion of real signal (strength of the drift) among noisy observations (random, volatile returns, quantified by the volatility here).

Numerical Example 4.3 Using the previous context, the t-statistics related to the series of returns of Strategies 1 and 2 are given in Table 4.4. If we want to compare our Strategy 1 with a random walk we should compare the t-value for each strategy with the related quantiles of test of the t-distribution. The null hypothesis is then:

H0 ¢ µ 0 against Ha ¢ µ x 0. We could conclude that with this kind of test, the two strategies appear to be significant and different from pure noise. Indeed the value of the t-statistic at the 97.5% level is equal to 1.962 with 1042 degrees of freedom (number of observed returns minus 1), inferior to 3.52 (with p-values around 2e  4). The null hypothesis is then strongly rejected in the two cases and the two strategies seem to appear, with this simple test, as being not pure noise and statistically significant.

For Strategy 2 however, even if its Sharpe ratio was apparently decent, the strategy is simulated on a pe- riod of time that is too short to assess its statistical meaning. The associated t-statistic is much too weak, this strategy is not statistically significant. In this respect, we will drop this strategy in the remaining of the chapter.

As a conclusion, from a statistical viewpoint, what matters is not the value of the Sharpe ratio itself but the product of Sharpe ratio with the square root of the number of observations, since this quantity is linked to the value of a t-statistic. We will see in the following that the study may be refined in various ways (including the study of higher moments to derive precise tests). Yet, using the extended Sharpe » 1For a precise overview of the statistical nature of this link, and a discussion on the bias T ~ˆT  1 that technically appears at finite distance, see Miller and Gehr (1978).

à ˆy Strategy Nb. of years Y Mean Yearly Returns µˆY Yearly Volatility σˆY Yearly SR Strategy 1 4 2.33e  02 1.33e  02 1.74 Strategy 2 2 1.31e  03 3.67e  04 3.57

Table 4.3: Yearly moments of Strategies 1 & 2.

55 ratio for an initial guess of the significance of the strategy is a simple yet robust approach.

ˆd º ˆd Strategy Nb of days T Daily SRà t T  SRà Strategy 1 1043 0.109 3.52 Strategy 2 522 0.065 1.48

Table 4.4: t-statistics related to Strategies 1 & 2.

4.3 Statistical significance of performance

4.3.1 Sharpe ratio annualization We have already given some insights on the computation of the Sharpe ratio in Section 4.2.1. We insisted on the fact that the quantity that is computed on the same frequency of the returns of the strategy. This has two consequences. First, if we want to annualize the Sharpe for comparison purpose, it is preferable to take into account the statistical properties of the return series to give precise annualized figures. Second, if we want to make some statistical tests, we have to manipulate the correct statistical objects, on the same time scale that the returns used to compute the Sharpe ratio.

4.3.1.1 Intuition Annualization has in itself no statistical purpose. Computing the estimated Sharpe ratio is in fact a stand-alone procedure. But how to compare two backtests that are not on the same time scale? It is not possible to compare performance levels of two strategies with on one hand, say, intra-day strategies, and on the other one macro bets with monthly liquidity. Therefore, the most reasonable time scale to convert performances is the yearly frequency, that became implicitly a business standard to compare performances. The most common way to annualize a Sharpe ratio (i.e. to set the Sharpe ratio to an annual unit whatever the time-frame of the data for computation) is to multiply the computed Sharpe by the square root of the frequency ratio Ny between the year and the frequency of the data. For instance, an estimated Sharpe ratio with monthly (respectivelyº weekly, daily)º dataº has a frequency ratio of Ny 12 (resp. 52, 250) and has to be multiplied by 12 (respectively 52, 250) to get an annualized value of the Sharpe ratio. For daily data, 250 is the number of business days in one year. Values such that 252, 256, 260 may also be found and must be generally explicitly given. This is in fact only true in the case of independent returns through time. Annualization is a time-aggregation of Ny returns and consists in finding the statistical properties of the statistical object:

tNy1 ˆy œ Rt Q Rt. œ t t ˆ  ˆy In the case where Rt is made of independent, identically distributed returns, the distribution of Rt has a mean equal to Ny  E R¥ and a variance» equal to Ny  V R¥. Therefore the Sharpe ratio of the aggregation of Rt over Ny periods is scaled by Ny.

Numerical Example 4.4 We compare in Table 4.5 the annualized values using either daily, monthly or yearly estimators of mean returns. For Strategy» 1, one sees that whatever the technique (aggregating then estimating, or estimating then rescaling by a Ny) the values are quire equivalent.

56 º ˆd º ˆm ˆy Strategy 250  SRà 12  SRà SRà Strategy 1 1.72 1.62 1.74

Table 4.5: Comparing all moments of Strategies 1.

4.3.1.2 Using statistical properties of the return series

It is important to understand that any annualization process is an aggregation, not a mean, and scales with Ny. Remember that the annualization step and the choice of the annualization coefficient is left to the practitioner who has to do things properly and must therefore understand the statistical challenges of the computation step. In fact, not only the frequency of data, but also the statistical properties of the series matter. ˆy When returns are no more assumed to be i.i.d. but still stationary, E R ¥ remains equal to Ny  E R¥ ˆy ˆy but the variance of R scales not linearly with Ny. The expression of V R ¥ is given e.g. in Lo (2002) and becomes in this case :

Ny1 ˆy 2 V R ¥ Ny  σ Š1  2 Q ˆNy  jρj, (4.2) j 1 where ρj is the j-th autocorrelation coefficient of R at its original frequency level. Therefore with an à ˆÂ  ˆ  estimated Sharpe ratio SR, and autocorrelation coefficients ρj j> 1;Ny¥ on a series Rt , the more ˆy general expression of the annualized Sharpe ratio ˆSRà  is:

ˆy N SRà ¼ y SR.à (4.3)   PNy 1ˆ  Â Ny 2 j 1 Ny j ρj

In practice, autocorrelation orders disappear quite slowly on common strategies, so if Ny is important, we may estimate autocorrelation coefficients on a finite number of orders, skipping estimation of greater order coefficients. If the strategy is not autocorrelated,» we see simply that all the ρ terms disappear and à ˆy à that we are left with the former expression SR NySR.

4.3.2 Testing significance with the Sharpe ratio Inference on a single Sharpe ratio from a given backtest is critical and should be a part of any quantitative manager’s toolbox in order to assess the true performance and interest of a given backtest. We already precised that under the assumption of Gaussian i.i.d returns, the extensive Sharpe ratio on T periods follows a t-statistic with T  1 degrees of freedom. Therefore this allows to derive a simple significance test where we test :

H0 ¢ µ 0, (4.4) or in other words, the nullity of the drift of the return process. In this case, for a test at the α significance level, the test statistic is directly the extensive Sharpe ratio and have to be compared to :

Y an unilateral sided 1  α quantile of the T  1 central t-statistic if the alternative hypothesis is Ha ¢ µ @ 0 or Ha ¢ µ A 0 ;

57 Y a two-sided bilateral quantile of the T 1 central t-statistic if the alternative hypothesis is Ha ¢ µ x 0.

Of course we could conduct this test with any other constant, and this test is related on the significance of the drift where the Sharpe ratio appears naturally as the test-statistic. The other possibility is directly to test the Sharpe ratio. A test of SR x 0 is quite useless since in that case, testing for the nullity of the drift is conceptually equivalent. In that case it is better to test for a fixed value SR1 of the normalized Sharpe ratio:

H0 ¢ SR SR1 vs Ha ¢ SR A SR1, (4.5) and in that case the test statistic is still the extensive Sharpe ratio, which has to be compared to the sided unilateral quantile of the T  1 t-statisticº with central parameter with a centrality parameter no more equal to 0 this time, but equal to TSR1. Those tests use the finite distance distribution of the Sharpe ratio. A subtlety is that the estimator of the Sharpe ratio is biased at finite distance, even if this bias vanishes quickly (namely, for a fixed T , the expectation of the estimator is not strictly the estimated parameter).

However, this Sharpe ratio estimator is asymptotically Gaussian with asymptotic moments such that:

º µ µ2 T ŠSRà   N Š0, 1  . σ 2σ2

µ and σ are again the true values of the drift and the volatility of the underlying process. Such an expression is given both in Jobson and Korkie (1981a) and in Lo (2002) (an equivalent does obviously exist for the t-statistic and is available in Johnson and Welch (1940)). This asymptotic law is useful since we could also test the significance of a given Sharpe ratio at the α confidence level and get confidence intervals for the Sharpe ratio. It will also allow (see below) provide tests when we will need to relax the assumption of Gaussian returns, and consequently to get an alternative to the use of the t-statistic, since in the case of strongly skewed returns for instance, the use of the Student t-test would be irrelevant. For an estimated Sharpe ratio SRà , this value belongs to the α level in the segment: ¾ 1 1 2 ¢SRà  q ˆα  1  SRà §, N T 2 where qN ˆα is the bilateral α-quantile of a standardized, centered Gaussian distribution. At this stage, we deal with statistical objects, this implies that such tests are to be used on the most frequent data. So if this Sharpe is estimated on T daily points, the quantities to plug in the test have to be the daily Sharpe, not any annualized version on another time scale. Of course, a test is simple to conduct since having the estimated Sharpe ratio is sufficient. A test like in Equation (4.5) means that we just have to check whether or not SR0 belongs to the estimated confidence interval.

But what happens when we release the assumption of Gaussianity for the returns? The asymptotic framework helps to keep the possibility to conduct such significance tests. If we still assume returns to be i.i.d. then the asymptotic variance is given by Opdyke (2007). The expression is corrected in the following way:

º µ 1 γ  3 T ŠSRà   N Š0, 1  SR2  γ SR  ‹ 4  SR2, (4.6) σ 2 3 4

58 2 where again SR µ~σ, γ3 and γ4 are respectively skewness and (absolute) kurtosis of the distribution. Here again, the asymptotic variance should be estimated since SR will be replaced by its estimator. The test remains the same, only the bounds of the confidence interval changing: ¾ 1 1 2 γˆ  3 2 ¢SRà  q ˆα  1  SRà  γˆ SRà  Š 4 SRà §. N T 2 3 4 Conversely to the Gaussian case however, the estimated Sharpe ratio is not sufficient and we need to estimate γ3 and γ4. In the Gaussian case, the two corrective terms should vanish and we end up with the former expression.

The more general and maybe more empirical case deals with the situation where returns are no more i.i.d. at all. The main assumption that has to hold to compute asymptotic terms is that the returns have to be stationary. Consequently, even in the case where returns show some autocorrelation or het- eroskedasticity, Lo (2002) derives the asymptotic properties of the Sharpe ratio using a GMM method. The estimator remains asymptotically Gaussian and unbiased, yet the computation of the asymptotic variance is far more complex3 and is left to the reader.

Numerical Example 4.5 We compute in Table 4.6 an estimation (up to the T factor) of the asymptotic variance for the estimator of the Sharpe ratio provided in Equation (4.6). This factor is interesting since we see how far we are from Gaussianity and what we would miss by using only the t-statistic to test for the significance of the strategy. At first order, using only the t-statistic is equivalent to use 1 as an  1 2 approximation for 1 2 SR in 4.6.

2  2 à  1 à  à  Š γˆ4 3  à Strategy SR γˆ3 γˆ4 1 2 SR γˆ3SR 4 SR Strategy 1 0.109 0.011 3.28 1.005

Table 4.6: Asymptotic variance estimator for Sharpe ratio for Strategy 1.

Main Bibliography

• Sullivan et al. (1999)

• Hastie et al. (2009)

• Lo (2002)

• Bailey and Lopez de Prado (2012)

• Bailey et al. (2014)

• Bailey et al. (2015)

• Lo (2016)

2 See again Section 2.5.4 for a discussion on skewness and kurtosis estimation. We voluntarily keep here γ3 and γ4 notation to recall that they have to be understood and estimated here as the third and fourth moment of the return distribution. 3See appendix A of Lo (2002) for the complete expression. Opdyke (2007) extends also this exercise to the non-i.i.d. case. However, in this precise setting, results are subject to caution since they are dismissed by Ledoit and Wolf (2008)

59 60 Chapter 5

Main Equity Factors

Figure 5.1: Ranking of Smart-Beta Exchange Traded Products in the US by theme, as of end of 2018 - Source: Bryan et al. (2019)

In this section, we want to describe a bit more the factors that are the most representative in the equity literature. Figure 5.1 wraps-up the relative importance, in terms of assets, of the exchange-traded prod- ucts in the US, as-of 2018. We recover here all the usual suspects. We make the choice to follow only some of them, the most famous ones. We will then choose to describe the following factors: Value, Momentum, Size, Low Vol (most probably called Dividends in Figure 5.1) and Quality. For each factor or anomaly, we will describe its features, the proxy variables to build the portfolios, its biases and drivers and attempts of explanations around the risk or the behavioural idea, to guess whether each factor is re- ally a factor or an anomaly. For each anomaly, we will in addition make some simulations (as described in the next portfolio construction box below) to illustrate its statistical behaviour. An in-depth study will be provided in Section 5.2 to describe the correlations and links between those strategies. Individual performance will be then provided in each specific section, before a global look at the cross-sectional structure of those five factors. For an exhaustive illustration of the behaviour of the factors in the aca- demic literature, one may refer to Arnott et al. (2019) for detailed figures.

61 Portfolio Construction The simulations made in this section are very crude. We build conceptual portfolios based on simple normalized signals s. For each factor and each pool of simulation a signal s is built for the N stocks of the pool at the date of simulation. Then each signal s for each factor on each pool is a real-valued vector of size N with quantities typically between 1 and 1.

We will be interested in 5 geographical zones: US, Europe, Japan, Canada, Australia. As a short-cut, for gaining space in figures, UL will stand for the US zone, EL will stand for Europe, JL for Japan, CL for Canada and PL for Australia. But how to define a geographical zone? Here are the criteria of the stocks that enter our study. We follow dynamically a pool of stocks on each zone such that the composition is actualized every three months, on the first business day of the months of January, April, July and October. We add daily filters on the data to get stocks that are sufficiently liquid. For this at each rebalancing, we consider the average value of the 3-months average daily liquidity (obtained as close price times traded daily volume). We ask moreover that there are at least 200 available prices in the last 300 business days, and at least one volume available in the last 100 business days. We target the following composition: US: 1000 most liquid stocks; Europe: 1000 most liquid stocks; Japan: 500 most liquid stocks; Canada: 200 most liquid stocks; Australia: 200 most liquid stocks.

We simulate over roughly 15 years, from the 1st of January 2014 up to the 1st of January 2019.

Long-Short portfolio construction - There is no portfolio construction per se since we start for each factor with a signal s that is homogeneous to a relative view of the factor, that fits with a Long-Short view. We translate simply our signal into positions. Therefore, our portfolio construction does not involve any transaction costs, direct or indirect, leverage financing, dividend tax, volatility rescaling, or anticipation of the speed of the signal. The only anticipation of the gain is made through the signal that is our proxy of future performance. This is different from instance from the Fama-French approach (like described in Section 3.2.5) since we use the whole signal (no bucket or no quantile involved here) and for each simulation, each simulation on a factor is independent from any other factor. We do not use one factor conditionally to the others.

Hedged version - - Our portfolio construction is very crude and may have a residual exposition to the market. This may be anecdotal for some factors, but critical for some others. For this we use a simple orthogonalisation to the market. Taking the signal s, a volatility vector σ (made of the individual volatilities of the stocks) and having at hand a proxy of the market through the first eigenvector λ0 of the PCA of the correlation matrix of the assets (see Section 9.1.4 to better understand this aspect), the three vector s, σ and λ0 have the same size (equal to the number of assets). We then compute the hedged version of s that we note sH : s s s σ ‹  ˆ .λ λ  . H σ σ 0 0

62 Portfolio Construction This is of course not relevant for the Long Only version since as the Long Only positions are al- ready very aligned with the market, “hedging” the Long Only portfolio would result in a portfolio.

Long Only portfolio construction - We will also provide a Long Only version for each signal s, even if s is initially between -1 and +1. To transform a Long-Short factor in a Long Only portfolio, we use a function that is conservative on the positive components of s and second attenuates the negative components of s. An ideal candidate for such a function needs to be smooth, monotonous and introduce few bias in the factor construction. If we want to attenuate the negative part of the factor, it would be too extreme to put the corresponding stocks to zero positions. The Z-transform is a realistic implementation of such a function. s is assumed to be between 1 and 1: the support of s is not similar at all to the one of a reduced, centered, Gaussian distribution. To do this, we approximate it through the rescaling of s between 3 and 3. This function is heavily used by banks to provide a Long Only version of factors through ETFs. The Z  transformed factor sZ is therefore: 1 1  3s s 1ˆµ @ 0   1ˆµ C 0  . (5.1) Z 2ˆ1  3s 2

Then we end up with a sZ signal with all components positive. We can therefore, as in the Long-Short version, translate this signal directly into a portfolio.

Worldwide version - For each portfolio construction, after having simulated the factor and having obtained a time series of returns, we get a worldwide version by summing a weighted version of those returns, with the following weights: 50% for the US, 30% for Europe, 10% for Japan, and 5% for Australia and Canada respectively. This is an unconditional weighting, which does not take into account the potential relative evolutions of the market capitalisations of one zone from the other.

“Vol scaled” - “Vol scaled” returns are artificial returns computed when the returns of thetime series of the factor portfolio are obtained. It is simply the same returns normalized by their past, trailing 250 days standard deviation (volatility). The aim is simply to have an idea of the returns of a portfolio where a constant risk is targeted.

Market returns - Whatever the portfolio construction, “Market” will always refer to the same portfolio. It is simply the aggregated, mean return of all stocks in the (geographical) pool at the time of the simulation, weighted by the relative market capitalisation of the stock (delayed of two days to avoid an endogenous weighting). The “market” is then, in this section, always “Long Only” and computed on each geographical zone. The worldwide version of the market is also, as before, a flat unconditional mean of those pool-dependent market returns.

Let’s underline that our conceptual portfolios, either they are Long Only, Long-Short, Market or rescaled, will present returns without reinvestment.

63 Remark 5.1 Unit-factor portoflios - In fact, a popular1 method to build Long-Short factor portfolios is to define what is called unit-factor portfolios. Let’s assume that you observe a matrix βt at time t. βt is a matrix that is konwn thanks to informations on stocks. It is a N  K matrix where N is the number of stocks and K the number of factors that can be fundamental factors along with dummies i.e. one-hot variables (0/1) depending on whether the stock depends or not to an industry or a country (with as many columns as industries). Each row is then related to characteristics of the individual stocks. Let’s say that the N  1 return vector is noted Rt. The factor model is therefore:

Rt βtFt  t

(with potentially βt lagged in time to avoid problems due to contemporaneous estimation). In this con- text, Ft is unobserved and represents the return of one unit of a factor. What is powerful here is that this unit-risk vector will provide, at the level of portfolio holdings (not in return), an orthogonalized version of portfolio factors with respect of the others. Elementary algebra may help also to show that the inclusion of dummies on industry may help to get cash-neutral portfolios.

ˆ ˆ œ Factors Ft are indeed estimated as Ft HtRt with Ht being a matrix (of holdings, this explaining the œ use of the letter H) such that βtHt IK . Ht should be interpreted as a transition matrix between the world of factors and the world of stocks. The expression of Hˆt is in fact the GLS estimator of the former regression with: ˆ ˆ 1 ˆ œ 1 1 Ht D βt βtDt βt where Dˆ is the diagonal matrix of individual stocks variances. In fact, D may be replaced by the covariance matrix of stocks (up to a potential regularization). Note that even if the unit-factor portfolios are orthogonalized in terms of portfolio holdings, there are not in factor returns since the covariance of Ft is not trivial.

5.1 Presentation of main factors

5.1.1 Value

1 JAN Discovery

1977. Basu (1977) identified the fact that value-related variables may help to understand de- viations from the CAPM. It has been later identified that stocks with high ratios of book value divided by market capitalisation have higher returns (see Basu (1983), Rosenberg et al. (1985), De Bondt and Thaler (1987), Chan et al. (1991)) and of course included in the 3-factors model of Fama and French (1992), along with Size and Market.

5.1.1.1 Description To explain the value factor, we need first to detail what is Book Value. For a given company, book value is a financial figure, readable in the financial statement. It is the difference between the assets of the company and its liabilities. Alternatively it may be called Total Equity. The book value, like its name suggests it, is the value of the company according to its financial books and statements. If the company were to be liquidated (bankruptcy or not), it would be the amount that shareholders would receive proportionally to what share of the equity of the company they own. In case of bankruptcy, debtors (as recorded by the liabilities) or corporate bond holders are served first and shareholders come

1This method is quite standard and used in major investment banks.

64 afterwards. And what they should get back in this case is represented by the book value. Conceptually, the book value is, or should be, of the same order of magnitude than the market value of the company. The market value incorporates the subjective view of investors on the market of the future value of the cash flows of the company. If the market value of the company is greater, this means that investors have a positive view of the outcomes of the company, and that the company’s value is more than it is today since it reflects the value of all its potentialities and prospects. The core idea of the Value factor is that it spots stocks whose effective market value does not reflect the fundamental value of the underlying company. Then the factor aims to capture the excess returns of stocks who have low market prices relative to this fundamental information. The factor is called HML according to the Fama-French semantics, this acronym standing for High Minus Low. But this fundamental information is still to be defined. There is therefore an attempt to capture this fundamental value usually using book value (most of the time under the classical definition) but it may be also done through other fundamental values taken from the financial statements of the company: earnings, income, sales, cash-flows, dividends, profits, etc. Screening the fundamental information through those various metrics aims at buying the stock that are under-evaluated (then “cheap”) and selling the stocks that are over-evaluated (then “expensive”). The pitch is pretty simple: buying good stocks with high fundamentals at a cheap price will lead to higher returns in the future as soon as investors correct their valuation. The effect is documented to be a 3 to 6 months effect (Ang et al. (2009a)).

5.1.1.2 Explanations

Two fields of explanations do exist, one related to risk (that should really make Value a risk factor) and a behavioural explanation, making Value rather an anomaly than a risk premium.

ABC Vocabulary

In order to better understand the general explanations that are given in this chapter to provide either a risk, either an anomaly interpretation, one needs to focus on the underlying mechanism that is used by academics. Each explanation will try to identify a source of desirability, a bias in the behaviour of the investor, a shift of their focus, etc. Afterwards, a loop is at play where, with the words of Asness et al. (2019a), “desirable means “excess demand” which means “higher price” which means “lower expected return””. This general loop explains the bias in rationality that may lead to the existence of a return premium.

Risk explanation - One explanation is that stocks’ prices move because of exogenous news or shocks. Those shocks are driven by systematic risk, depending on the business cycle. Value and Growth firms (Growth firms being defined by the fact that they are the opposite of Value firms) are in this respect, reacting differently to such shocks. Value firms are said to be riskier in bad times, due to the fact that they are generally more mature businesses that cannot change quickly their business towards more prof- itable segments. This feature is called “costly reversibility” and accounts for an anti-cyclical pricing of risk. They show an “high and asymmetric adjustment costs” (Ang et al. (2009a)). Value firms are also said to have a cyclical sensitivity towards market risk. When the market is up, growth firms invest and expand their activities to take benefit from the positive trend in the economy. Value firms are riskier in bad times, where the price of risk is increased. The proponents of this explanation may be found among Cochrane (1991, 1996), or Zhang (2005). Other risk explanations may be found in Fama and French (1992) or in Berk et al. (1999), where Value stocks are principally identified as stocks related to financial distress. The impact of bad market states is amplified for those stocks. The literature is however quite controversial on the risk explanation for Value.

65 Behavioural explanation - The explanation that is commonly given is that the factor arises because of errors in the expectations of the investors that are potentially attracted by more glittering strategies (see Lakonishok et al. (1994)). Investors are maybe too optimistic when evaluating growth companies, and too pessimistic when evaluating value companies, creating then an undervaluation of Value companies, the prices returning to equilibrium as soon as expectations do adjust. Value firms in this framework are not riskier but only cheaper since simply mis-valuated by investors, reacting in an inappropriate manner to news. For an other behavioural explanation, see Barberis and Huang (2008).

Let’s note that in Lemperi´ ere` et al. (2015), Value-HML stands as an outlier among many factors’ can- didates. The skewness of Value’s returns is slightly positive, as opposed to Size and Momentum for instance, and the largest returns are positive, contributing positively to the gains. It is therefore difficult to justify that Value is a risk premium. The intuition given in the paper is that stocks with a high book to price, are typically thought of as defensive stocks, while stocks with low book to price are potentially riskier bets. The results are however not robust to the monthly aggregation, since daily returns are pos- itively skewed and monthly returns, negatively skewed. A boosting effect could be that Value firms are typically “losers” that is, stocks performing poorly in the past. If too much importance is given to this losing trend, investors amplify this de-valuation, the stock being too much under-valuated.

5.1.1.3 Simulation and Performance

Portfolio Construction The raw signal is built as the Book Value of the stocks scaled by their market capitalisation. Book Value is simply the trailing 12-months Total Equity, in dollars. Market capitalisation is computed by a rolling mean over 100 days, lagged by 20 days, of the market capitalisation in dollars. We use the number of outstanding shares to compute market capitalisation. We then apply a flat ranking of this ratio to obtain a signal between -1 and 1.

The result of our simulation is available in Figure 5.2. Coming not as a surprise, all along the nu- merical simulations of this chapter, the Long Only implementation of the factor are quite close to the market. It is often the case with such implementations, due to the common market mode (no short position). The Long Only simulation appears more as a tilt for the factor when compared to the market. We will therefore comment more deeply the Long-Short implementation. We see that the performances are dreadful. This is not really a surprise, since the factor is losing a lot since some years now. Japan appears however as an exception. This is amplified by the fact that the timespan of our simulation is quite short (2004-2019) and the Value factor was said to work better before this period. In fact, first concerns about reliability of the performance of the factor came in the 90s, with concerns raised by Black (1993). Even on the website of Kenneth French, one can download the results of the Value factor in the data library and see that the factor is poorly performing. See the following website: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/index.html.

Main Bibliography

• Basu (1977)

• De Bondt and Thaler (1987)

• Fama and French (1992),

66 Figure 5.2: Performance of a simulated Value factor. Period: 2004-2020.

5.1.2 Momentum

1 JAN Discovery

1993. The discovery of the factor dates back to 1993 and is in general attributed to Jegadeesh and Titman (1993). Carhart (1997) closely follows and adds Momentum in a broader factor-model.

5.1.2.1 Description The Momentum factor exploits the simple idea that past winners are the future winners. For this reason, it is sometimes called UMD in the Fama-French lexicon, since it is the natural translation of the factor’s idea: UMD stands for Up Minus Down. Stocks with high returns within the past year tend to continue to post high returns: strong past performance would then be an indicator of future strong performance. The factor is therefore sometimes qualified as a “positive feedback strategy”. The factor captures in general the anomaly by sorting stocks along their past average return, computed usually as the mean of total returns (price plus dividends) over a long period of typically 3, 6 or 12 months, with the last month ex- cluded most of the time. To reformulate the idea of the factor, stock prices tend to have some persistence in their movements (either negative or positive) over some typical time-horizons. Jegadeesh and Titman (1993) is generally consecrated as the seminal paper on the anomaly, identified on the US stock market initially. Carhart (1997) expanded in 1997 the initial factor model of Fama and French to include and categorize Momentum as a cornerstone factor. Since then, it has been identified to be a universal factor among zones, countries and asset classes (see Rouwenhorst (1998), Fama and French (2008), Asness et al. (2013)). Fama and French (2012) finally agreed to add Momentum in their core factor representa- tion in 2012, acknowledging that Value and Size were not able to capture the Momentum effect. Even if the Momentum effect is generally accepted as being transient (the effect disappears quite quickly, gen- erally not much more than 3-6 months), the anomaly is a long-standing one and many papers identify the fact that it may be observed for more than 200 years now (Geczy and Samonov (2016), Lemperiere et al. (2014) among many more)!

When dissecting the features of the anomaly, it appears that the long-standing persistence of the anomaly

67 makes it probably not the result of luck or data mining. In general, the positive returns that are earned are far bigger than for the other factors that we present in the present Chapter. However, the Momentum fac- tor has some dreadful features. In addition to huge implementation costs (see Section 6.3) and implicitly, a high turnover, the strategy is “prone to periodic crashes” (as explained by Barroso and Santa-Clara (2015) and Daniel and Moskowitz (2016)). This risk of quick and unpredictable reversal is the dark side of the strategy that counter-balances the easiness of its computation and its implementation. This reversal is in general very difficult to predict since it is linked to volatility which is in itself a puzzling topic to apprehend.

5.1.2.2 Explanations The theory that explains the rationale behind the anomaly is still at the heart of a fierce academic debate. A satisfying explanation is still missing. The theories that are in general accepted are all belonging to the behavioural field. In spite of a potential reward for baring tail risk (the daily momentum returns are a compensation to tolerate sudden crashes) the risk explanation is generally not retained. It is more often linked to the under- or over-reaction of investors to news, as stressed by Hirshleifer (2001) and Barberis and Thaler (2003). The tenants of behavioural finance are interested in “irrational” reactions of investors that are flawed by their own over-confidence and biases. If investors tend to be too con- servative, they take too much time to realize that they are, for instance, just herding other investors, or in a mid of a loss for instance. They may also loose focus, if for instance a short deleveraging from a massive player induces a decrease of a stock’s price de-connected from its fundamentals, a “Momentum approach” will exacerbate this move (Vayanos and Woolley (2013)). And this pattern is even amplified if the deleveraging is slow and auto-correlated over time. The over-confidence is also driven by the fact that as Momentum returns are high, investors become proud and confident of their results and exacerbate their use of Momentum strategies.

A risk explanation is available in Lemperi´ ere` et al. (2015), but the pitch is a bit more subtle. It is identified that the skewness of UMD’s returns is negative, driven mostly by the skewness of winning stocks (the long leg), in particular due to the 5% largest events that contribute to losses equivalent to 20% of the gains (made by the 95% other events!) This accounts for a Risk Premium interpretation, where the risk here is an asymmetric one, being a kind of third moment-related risk.

5.1.2.3 Simulation and Performance

Portfolio Construction The raw signal is computed as the mean over 220 days, lagged by 20 days, of the total returns (price and dividend returns). We then apply a flat ranking of this quantity to obtain a signal between -1 and 1.

The result of our simulation may be observed in Figure 5.3. All the striking features of Momentum can be observed on this simple simulation. First we can observe that the strategy is somehow universal in spite of its simplicity. Second, it produces high Sharpe ratios (easily around 0.8) when compared to other factors. Even in the Long Only setting, we observe a gain of 50% in returns, showing that it is more than a tilt, but really a strong driver of performance. Third, let’s observe that the crashes are important when they occur. On the orange curve, during the GFC in 2008, the factor erased its relative extra-gains to come back to the level of the market in late 2009. On the pink curve, in the same period, the factor crashed and lose a lot. Many other sudden drawdowns do exist all along the PnL path of the factor in

68 the Long-Short implementation. What is really interesting is that when we control for the volatility of the returns (purple) we see that an efficient risk control helps to recover a strong performance on the long term. Momentum is known to have an extreme risk profile and taming this component allows to get better risk-controlled returns. The main mechanism is that investors tend to deleverage after a huge loss, preventing to gain extra returns if the factor show some recovery. On the purple curve, what is really done is that the constant risk exposition makes that the loss is controlled when it goes bad. See e.g. Roncalli (2013) for a description of such a mechanism.

Figure 5.3: Performance of a simulated Momentum factor. Period: 2004-2020.

Main Bibliography

• Jegadeesh and Titman (1993)

• Carhart (1997) Fama and French (2012)

• Lemperi´ ere` et al. (2015) Daniel and Moskowitz (2016)

5.1.3 Size

1 JAN Discovery

1981. The original discovery is attributed to Banz (1981), working on NYSE stocks. Other seminal works have to be found among Roll (1981), Reinganum (1981), Christie and Hertzel (1981), Schwert (1983), Blume and Stambaugh (1983), Chan et al. (1985), Huberman and Kandel (1987), and Chan and Chen (1988).

5.1.3.1 Description The names that are commonly used for the anomaly are Size, Small Minus Big or equivalently SMB. The factor captures the fact that small firms have excess returns with respect to big firms. The size of the

69 firm is generally measured through the measure of the market capitalisation (which is the product of the number of shares times the price per share). The effect is assumed to survive even adjusting for market beta and Value. If the result is generally attributed to Banz (1981), its impact on the literature is mainly due to the fact that this is one of the three pillars of the celebrated 3-factors model of Fama and French (1992).

If the factor is really easy to describe, the puzzle arises when we stress the robustness of the factor and dive into the description of its features. First, to begin on a positive note, let’s first stick to the stylized facts around the drivers of the premium. The premium seems to be very concentrated and the effect is really due to few, very small stocks, providing extremely positive returns, so that those small stocks finally become big stocks (Fama and French (2007)). The effect is then due not to small, but micro- capitalisation stocks (Fama and French (2008)), and the definition of the investment universe is then crucial. Knez and Ready (1997) estimate that the driving force of the portfolio is the lowest 1% of the stocks, that are not outliers. Crain (2011) finds a figure of 5%, and Horowitz et al. (2000a) show that to be observed, the portfolio should include in the US, stocks with a market capitalisation under USD 5 millions (which is really tiny). This indicates that the effect is really non-linear with a limited effect on small stocks, and negligible among mid and large stocks. Moreover, the premium shows a strong seasonality, oddly realizing the essential part of its returns in January (see Keim (1983), Lamoureux and Sanger (1989), Daniel and Titman (1997), Horowitz et al. (2000a) who put forward that nearly half of the returns are due to this January effect).

The problem with the Size factor, is that even its proponents admit that the premium is, at best, inter- mittent. It is not very robust and lacks of universality, being very dependent on the period, market, and investment universe. Even the first work of Banz (1981) underlined that the Size effect in the US was extremely intermittent over the test period (1926-1975 in the paper). This is confirmed by Keim (1983), Crain (2011), Dimson et al. (2011) or Chen and Zhao (2009). At best, let’s underline the work of Asness et al. (2015) that acknowledges this intermittent behaviour but stress the fact that the premium can be enhanced by controlling by the “quality” of the stocks (see below). Doing so allows to add some robust- ness, and gain some universality to the premium.

But to end on a negative note, many studies rather stress the fact that the premium is at best, hardly significant, or has simply disappeared. Mittoo and Thompson (1990), Fama and French (1995), Dimson and Marsh (1999) were worried about the significance of the results while Schwert (2003) finds that the anomaly probably vanished quickly after the discovery of the effect. And as Dimson et al. (2002) cast the doubt on the existence of the January effect, Dichev (1998), Chan et al. (2000), Horowitz et al. (2000b), and van Dijk (2011) simply did not observe the effect. The case is more or less closed when Fama and French in 2011, found no premium for the Size effect after re-examining returns on the 1990- 2010 period, whereas the factor was a pillar of their 3-factors model.

Let’s note the contribution of Ciliberti et al. (2019) that revives the study of the Size premium by keep- ing the idea but computing the size of a stock, not through the market capitalisation, but using the daily traded volume (in currency) of the stock. This allows to recover a more robust and more universal per- formance, and a factor whose returns are less anti-correlated with the Low Vol anomaly, and therefore easier to include in a portfolio allocation.

5.1.3.2 Explanations The effect, if any, is the subject to controversial interpretations. It is clearly an open question. First it may simply be a side-effect of data mining. As explained before, the effect is weak and intermittent

70 and too much dependent of the definition of the investment universe. This argument is for instance put forward by Black (1993). OK, but if there would be a Size effect, what could be the cause? There are as many risk explanations than behavioural ones.

A natural interpretation, as put forward by Fama and French (1993), Berk (1995) or Crain (2011), is that small firms are riskier, since they are less mature with less financial power, and less developed activities. This risk needs therefore to be compensated by higher returns. They are also more sensible to systemic risk as they are less robust and are weak when faced to market shocks, and are sometimes past poor performers: this is the point of Chan and Chen (1991).

The risk explanation related to the Size factor is sometimes linked to financial distress. Financial distress occurs when a firm is in such a bad state that it is not able to meet its commitments towards debtors. Even if the default-risk is not the main driver of the factor, it shares some information related to distress or risk of default as stated by Vassalou and Wing (2004). This argument is also supported by Chan and Chen (1991), Fama and French (1993, 1996). This interpretation is however discussed and rejected by Dichev (1998).

A last explanation is related to liquidity. Indeed, market capitalisation is quite related to liquidity. And the smallest stocks are the less liquid ones, so the most expensive to trade. Size is then mechanically a costly factor to trade, even if as market capitalisation evolves slowly, the natural turnover of such a portfolio is not so high. Amihud and Mendelson (1986) find in particular a natural link between the size of a stock and liquidity risk, measured with the bid-ask spread. Results of the paper indicates that the size effect is mainly a liquidity one. In conjunction with Ciliberti et al. (2019), the two papers are therefore two faces of the same coin.

The typical arguments that are given to explain the Size anomaly on the behavioural side are very similar to those used for Value. The factor is said to appear because of errors in the expectations of investors losing focus, attracted by glittering strategies. Investors under-react to news and miss some insightful information on small stocks, that are less catching the eye of the buyers compared to glamorous stocks (Lakonishok et al. (1994) again).

To conclude, the risk interpretation remains however as a very probable one with the work of Lemperi´ ere` et al. (2015). If risk is measured not through a symmetric measure as volatility, but with a third-order measure, Size shows a strong negative measure of skewness.

5.1.3.3 Simulation

Portfolio Construction We simply compute the average over 250 days, with a 20 days lag, of the market capitalisation in dollars. We use the number of outstanding shares to compute market capitalisation. We then apply a flat inverse ranking of this quantity to obtain a signal between -1 and 1.

The result of our simulation is available in Figure 5.4. We must be cautious about the results since it is more than likely that they are deeply affected by the crudeness of our portfolio construction for the experiment. We see clearly that for the Long Only implementation, the factor is not a tilt but a burden for returns. Indeed, the stocks are exactly weighted in reverse order in the orange curve than in the black one, but all with positive components. Yet, we do not take benefit of the rally of big stocks in the post- 2009 period, and the stocks (even if we take 1000 stocks in the US for this) are not too small enough

71 to be bargains. Therefore there is no lottery effect for this portfolio. The results for the Long-Short implementation should only prove that the portfolio construction for the Size factor is tremendously important. As we do not control for idiosyncratic risk (as small stocks are high volatility stocks), it is more than likely that the pink curve, as it seems that it is more or less the black curve that has a flipped sign, should have a heavily negative beta. We will come back later on this point (see Section 5.2). This shows however the weakness of the results of the factor and the difficulty to reproduce academic results in an empirical experiment.

Figure 5.4: Performance of a simulated Size factor. Period: 2004-2020.

Main Bibliography

• Banz (1981)

• Fama and French (1993, 1996)

• Schwert (2003)

• Crain (2011)

• van Dijk (2011)

72 5.1.4 Low Vol

1 JAN Discovery

1975. The first studies that noticed that there was something wrong with the CAPM dates back to the 70s. So we can maybe think that it occurs with the work of Haugen in Haugen and Heins (1975). Confirmed later by many studies, among which Haugen and Baker (1991). Technically the work of Black et al. (1972) is however the first proof that the relation of return and beta was not as strong as one should expect following the CAPM theory (or at least, strong and increasing...). A little story tells that Black failed to convince a famous bank to launch a fund betting on this idea at that time...

5.1.4.1 Description The Low Vol anomaly is very simple to describe, yet it is the more puzzling, since it is totally at odds with CAPM theory. We thought that more gain should come with more risk. For stocks it seems to be the contrary! Low Volatility stocks are usually stocks with low beta, high market capitalisation, higher dividends and higher returns (see Chapter 7). The puzzle here is that the message is totally in opposition to the lessons of modern finance. One should indicate here that we make the choice here to present the Low Volatility anomaly but we should speak about low risk. There are several ways to seize the factor but they are however quite different. One may capture this factor by anti-ranking volatility, captured typically by a one-year standard deviation of returns. One may use the total volatility (computed on raw returns), or the idiosyncratic volatility (i.e. on returns where we subtract beta times the returns of the market). Both computations yield more or less the same results as shown in Blitz and Van Vliet (2011) as for stocks, the volatility is mostly idiosyncratic. But it is also possible by anti-ranking beta to market or even anti-raking correlation to market. Those two approaches (see Frazzini and Pedersen (2014) and Asness et al. (2019a) respectively) are technically quite similar in the spirit, since we bet against a risk- proxy. In fact they are not. As explained by Cazalet and Roncalli (2014), the two strategies are similar when they are played in leveraged portfolios (said alternatively, Long-Short) or when one assumes that the market portfolio has no positive expected risk premium. But not technically the same since they induce strong differences on the way the short leg of the portfolio is structured (and incidentally on stocks). We deliberately choose to focus on Low Volatility for sake of concisions. However one may also speak of a “Low Risk” factor and quite naturally of a defensive strategy.

The seminal idea behind the Low Vol factor is totally in opposition with CAPM messages. Anticipating on the rationale behind the factor, it is difficult to think that those stocks are more profitable because they bare more risk, since it is the opposite as soon as risk is measured by variance. Many papers show that this factor is alive and well, among years and countries, on developed as on emerging markets; the literature is tremendous: Haugen and Baker (1991), Chan et al. (2000), Ang et al. (2006), Clarke et al. (2006), Blitz and Van Vliet (2007), Ang et al. (2009b), Baker et al. (2011), Chen et al. (2012), Iwasawa and Ushiwama (2013), Li et al. (2014), Novy-Marx (2014), Frazzini and Pedersen (2014). And there are many more.

Let’s note in addition the findings of Bouchaud et al. (2016). The paper shows that the anomaly arises as the addition of two structural effects that are relatively independent. Low Vol is highly loading on high dividend stocks and dividends’ returns are responsible for a large part of the gains (more than two thirds). But even with this strong dividends bias, the anomaly remains strong for ex-dividend returns anyway. It is also showed that the anomaly is not driven by top or bottom deciles only, sectors or extreme events. In any case, it is impossible to support the fact that the strategy may qualify as a risk premium.

73 Anyway, a clear and final word is still missing, showing that the separation between risk premia and market anomalies may be difficult to set, even on premia/anomalies standing for several decades.

5.1.4.2 Explanations

It is quite natural to say that explaining the anomaly is a puzzle. We may directly talk about an anomaly because a careful reader will naturally think that this is not a risk premium. And of course risk expla- nations are missing for Low Vol. So what are the possible alternatives? One argument of Black and Scholes (1973) is that restrictions on the short leg, that is borrowing limits (costs, quantities, etc.), may constrain investors that want to invest in risky stocks. As recalled by Asness et al. (2019a) investors that want to invest a lot borrow money to do so, but they face constraints in practice. Shorting limits are such constraints, and investors cannot leverage their portfolio up to any level. In addition to shorting, they are other kind of costs and frictions in real life that makes that makes that all investors have not the same appetite for leverage. In consequence, investors that are less risk-averse focus on high-risk stocks, and more risk-averse investors blend their portfolios with stocks with less risk, that are less demanded, cre- ating an under-valuation, that gives birth to the anomaly when it corrects. This explanation is however highly discussed (see Ang et al. (2009a)). An other path for the understanding is to infer on the role that the general level of interest rates may play, this being an ongoing topic of research.

Many attempts of explanations are more on the behavioural side, see e.g. cornerstone papers on the topic such as Barberis and Huang (2008), Clemens (2012), Iwasawa and Ushiwama (2013) or Hou and Loh (2016). The most common behavioural explanation is the lottery effect. The idea of the “lottery effect” is quite natural (see e.g. Barberis and Huang (2008), Boyer et al. (2010), Garrett and Sobel (1999), Iwa- sawa and Ushiwama (2013), Hou and Loh (2016)): investors like small risks and big gains. Stocks with low risk have weak expected losses but rarely very high gains. Contrary to highly volatile stocks, where the probability of frequent losses are important, but that can also deliver exceptional positive gains with a non-zero probability. A lottery ticket is like that: few probabilities to be a winning one, yet potentially with a very high gain. Consequently investors are ready to pay more for such stocks, under-valuating in the mean time low risk stocks. Academics talk about an “irrational preference for volatility stocks”. Similar stories do exist, that assume that low risk stocks are under-looked with respect to other stocks. It is then a problem of focus of investors. Some researchers think that some stocks are more glittering (see Barber and Odean (2008)), attracting attention. Others propose the idea that analysts are too much optimistic about high-volatility stocks and then artificially drive investors towards mis-valuation for high and low-volatility stocks. But the idea is the same: Low Volatility stocks are not under the radar and therefore undervalued, leading to higher unexpected returns.

Other attempts of behavioural explanations do exist however, linked for instance to institutional incen- tives (Haugen and Baker (2012)); overconfidence of investors in their ability to forecast; or asymmetric behaviours of investors depending on the state of the market. We would however propose an other ar- guments. The fact that, as developed in Bouchaud et al. (2016), the major part of returns from the Low Volatility stocks, comes from the dividends, is quite unique. Indeed, nearly 80% of the returns of the strategy are coming from dividends. This means that those low vol stocks are quite regular, safe, and deliver cash flows through dividends regularly. We think that the demand for such stocks is likely to be driven by pension funds that need to regularly provide huge amount of money (for retirement plans for instance) and that those stocks are perfect investments for them, allowing to get safely periodic amounts of cash, without any deleveraging. This assumption would have of course to be backed with further research.

74 5.1.4.3 Simulation

Portfolio Construction The raw signal is computed as the standard deviation over 250 days, lagged by 20 days, of the total returns. We apply an inverse ranking of this quantity to obtain a signal between -1 and 1.

It is known that the Low Vol factor is, along with Size, the factor whose interplay with other is the most sophisticated to master. Portfolio construction plays a tremendous role. That’s the main reason why normalizing in risk (or not), or hedging the market, or not, may have a considerable impact on the final results. The inherent exposure of the Low Vol factor to the others is often highlighted. Bali and Cakici (2008) spot in particular the correlation of Low Vol with Size. Novy-Marx (2014) identifies the bias of high volatility stocks towards small and growth stocks. However the results indicated in Figure 5.5 are not so bad. What appears first is again the universality of the factor, that leads to positive returns in every geographical zone, with homogeneous Sharpe ratios around 0.2-0.4. The risk-control helps a bit also. In the Long Only case, the effect is huge with more than 50% increase in Sharpe ratio. This should come mainly from the importance given to high-dividend paying stocks and the fact that we do not include dividend taxes (but this is also the case for the market so the comparison is fair).

Figure 5.5: Performance of a simulated Low Vol factor. Period: 2004-2020.

75 Main Bibliography

• Haugen and Heins (1975)

• Haugen and Baker (1991)

• Chan et al. (2000)

• Clarke et al. (2006)

• Blitz and Van Vliet (2007)

• Ang et al. (2009b)

• Novy-Marx (2014)

• Frazzini and Pedersen (2014)

• Bouchaud et al. (2016)

5.1.5 Quality

1 JAN Discovery

2000. The first quantitative approach to measure quality may be found in the paper of Piotroski (2000). Technically, the work of Sloan (1996) came before and accruals are now recognized as a way to approach Quality. However, Benjamin Graham, one pioneer of Value investing was already aware of Quality investing that he considered as an other side of Value.

5.1.5.1 Description Whereas fundamental analysis (looking at core fundamental information and financial statements of a company at a detailed level) is for long an approach adopted by stock pickers and discretionary man- agers, it is quite recent for quantitative asset managers. Therefore, the discovery of Quality as a factor is pretty recent and is generally attributed to Novy-Marx (2013) or Asness et al. (2019b). The Quality fac- tor aims to target “quality” stocks, anticipating that those stocks will have higher returns. Those quality stocks are generally stocks of companies with low debt, stable income, materialized by high levels of cash-earnings, stable growth in dividends, stable earnings and incomes, strong balance-sheet, etc. The metrics to measure Quality are various and numerous: defining those metrics is the hard part of the pro- cess, and those metrics generally depend on each asset manager even if the variables that we just listed are the most common measures of quality. It should measure stability, profitability, maturity, competi- tiveness. But for sure, the practical implementation of Quality has multiple forms. This is moreover one argument of the opponents of the qualification of Quality as a factor. It is also sometimes put forward that it is already more or less behind the core idea of Value: focusing on good stocks (that are moreover cheap for Value). Quality targets core quality whereas Value target temporary relative mis-valuation. But the initial ideas are somehow related. Warren Buffet is said to target quite often such quality stocks. In fact, apart from the factor discussion, it is the most natural investment idea that one may have: buy good stocks, even if the challenge is to spot them! There are also debates on the explanatory power of the factor: as an example, Fama and French reject this factor candidate in the mid-2000s before finally

76 accepting it in Fama and French (2015), making this time Value redundant. In the same time, some other studies find that taken alone, it is one of the most capacitive factor (i.e. it may be able to survive costs up to huge invested amounts, see Landier et al. (2015)) since the quality nature of a company is very persistent (Fama and French (2015), Novy-Marx (2013)).

5.1.5.2 Explanations

The explanation for the long-standing power of the anomaly is hard to give. And the number of works to explain it are not very numerous. Indeed, if the markets are efficient and all the information processed, markets should not be surprised by quality information on qualitative stocks. In other words, the infor- mation that a firm is a qualitative one is a slow one and it should be priced in the market. Said with the words of Asness et al. (2019b), we may conceive that Quality stocks have higher prices, yet it is a puzzle to understand why they have higher returns. Indeed, investors may want to pay a higher price for quality stocks, as the underlying companies have better earnings, incomes and fundamentals. But still, the Quality nature is not completely priced in the cross section, leading to higher risk-adjusted returns. Some academics do also question the link that the factor shares with decreasing interest rates, making quality stocks (behaving like bonds) relatively more attractive. Also, with so many implementations and definitions, are we not facing again our data-mining trap, again? A risk explanation is hard to account for, since holding a qualitative stock is not risky per se so those stocks should not be riskier than the other ones. Or at least in a traditional view of risk. Moreover, the Quality factor is said to be quite robust in market downturns and perform decently in market bad times. See for instance Hirshleifer et al. (2012). The only argument that makes sense is again a behavioural one, with investors making mistakes in their expectations or in the way they process information the only way to conceive that this anomaly persists. But even those explanations are deceptive according to academics. The same themes, as before, appear with investors being over-confident in their abilities to forecast (trying to spot outsiders rather than sta- ble, established companies), losing their focus (being more interested in short term earnings reactions than stable, long-term fundamental figures), or preferring stocks with a lottery-like profile.

5.1.5.3 Simulation

Portfolio Construction We compute the ratio of the Operating Cash-Flow divided by Total Assets for each company. Either for Operating Cash-Flow and Total Assets, we compute the trailing 12-months value, in dollars. We then apply a flat ranking of this ratio to obtain a signal between -1 and 1.

Results are provided in Figure 5.6. Whatever the zone, or the implementation the results are impressive. Like in the Low Vol case, the Long Only implementation is impressive, but unlike Low Vol, the dividend bias (not illustrated here) should be lower. Quality really focus on well-performing, safe stocks and this translates in returns. The Sharpe ratios obtained in the Long Only and in the Long-Short case are of the same order of magnitude which is not always the case. We can navigate between 0.8 and 1.4 before costs. It is particularly high in Europe. The capacity is difficult to illustrate with those simulations, but the robustness of the factor is at play here. What is more striking is that in the Long-Short case, the results are the best of all the factors we scrutinize here. Here again, controlling for the risk of the returns helps to give better risk-adjusted returns overall.

77 Figure 5.6: Performance of a simulated Quality factor. Period: 2004-2020.

Main Bibliography

• Novy-Marx (2013, 2014)

• Landier et al. (2015)

• Asness et al. (2019b)

5.2 Correlation and statistical properties

5.2.1 Foreword Understanding the correlation and the links between factors is of utmost importance. Those correlations, the expositions of the assets in cross-section, and their link with the market do change over years. As for instance stated by Arnott et al. (2019):

“Market betas of factors vary widely over time. The value factor, for example, typically correlates negatively with the market. During the global financial crisis, however, the value factor correlated positively and significantly with the market, performing poorly as the markets tumbled and soaring as the stock markets rebounded. (...) An investor who anticipates factor diversification benefits may be disappointed, as value investors assuredly were during the crisis. (...) Investors need to understand correlations. Many investors mistakenly believe they can diversify away most of the risks in factor investing by creating a portfolio of several factors. In periods of market stress, most diversification benefits can vanish as the factors begin moving in unison. An understanding of how factors behave in different environments (...) and how correlations change through time, is essential.”

We find the same kind of diagnosis in Asness (1997), especially focused on Value and Momentum :

“ Momentum and value are negatively correlated across stocks, yet each is positively related to the cross-section of average stock returns. (...) Value strategies work, in general, but are strongest among

78 low-momentum stocks and weakest among high-momentum stocks. The momentum strategy works, in general, but is particularly strong among low-value stocks. (...) Both value and momentum strategies are effective, although value measures and momentum measures are negatively correlated. Thus, pursuing a value strategy entails, to some extent, buying firms with poor momentum. Equivalently, buying firms with good momentum entails, to some extent, pursuing a poor-value strategy.”

Built in practice along different data, the two strategies are naturally anti-correlated. Knowing the typ- ical correlations between the strategies, and understanding their dynamic nature is then the aim of the present section, and this is what we will try to illustrate (modestly) in the following. After the results of all the numerical simulations of Section 5.1, factor by factor, we saw that the Long Only implementa- tion of the factors where, at each time, very close to the market’s returns’time series. Therefore, we will naturally simply focus in the present section on looking at the Long-Short implementation of the factors as the interpretation will be more distinctive and more illustrative, as the results in the Long Only case are mainly driven by beta exposition.

First let’s look at Figure 5.7. On this figure, we compute the relative level of risk of each strategy, under a Long Only implementation (left) and under a Long-Short implementation (right). We take the Long Only or a Long-Short daily returns of the aforementioned strategies, aggregated Worldwide. Therefore the basetime seriess considered are the orange curves (left panel) and the pink curves (right panel) of the previous simulation graphs. We start by normalizing each time series to have a risk of one (we divide each return by the standard deviation on the whole sample). Then, we measure a rolling, normalized volatility as a rolling 250-days standard deviation of returns. If the returns’ time series were stationary (see Section 2.3 for a definition), we would obtain roughly a constant around one. As our profit curves are generated with a crude portfolio construction, with positions that are directly defined by the signal, all the strategies inherit at first order from the level of risk of the market. This is not a surprise yet we see that if we do not control for the risk of the portfolio, the risk of the market can drive a lot the overall level of risk of returns of the factor strategy. There are bumps all along the simulation that are more or less shared by the strategies. The big increase of risk in 2008-2010 is of course due to the Global Finan- cial Crisis. There is also an other increase (not as important indeed) that is driven by a lot of political uncertainties (Brexit, US and China trade tariff war, etc).

Figure 5.7: Relative volatility of the simulated factors for an unconditional unit risk on all the simulation. Worldwide, 2004-2019.

79 Let’s jump to Figure 5.8. This figure is enlightening at various levels. Here again we consider the Worldwide daily returns’ time series of factors on the period 2004-2019 (yes, our pink curves, again – “Market” accounts for the black line). We then look at the overall correlations of the strategies on the period. When we consider the raw strategies, we obtain the graph on the left panel on Figure 5.8. We see a big cluster of strategies that are more or less correlated between them, and anti-correlated to the market. This is what we were thinking with the performance of Size, that seemed to be an anti- market strategy. The anti-correlation of Size with the market is then not a surprise. Yet, the negative correlations of Low Vol, Momentum, Quality do not appear at first sight when we look at the simulation experiments. All strategies show a positive profit, as the Market, this show that those winning strategies are interesting parts of a portfolio since they offer both a premium and a diversification to holding the market. Moreover, Value show an inverse pattern, with a positive correlation to the market, but with a negative performance. The problem is that correlations on the left panel are high in absolute value, and we would like to have a purer version of the strategies, less sensitive to market moves. For this, we plot in Figure 5.9 the rolling, 250 days beta, of each factor strategy, under a Long-Short implementation. The beta is computed as the OLS rolling regression of the Long-Short returns (pink curves) on the market returns (black curves). Then we see what we suspected until now. Size and Low Vol (with our portfolio construction!) provide a strong negative beta, with a steady exposition. Quality has sometimes a negative beta, up to -0.5, but this beta can go down to 0, even for some long periods (2 years in 2006 before the crisis, and 1 year in 2018). Value has an intermittent beta, that can spike at some periods, but reflects more or less the positive correlation of the factor with the market. Clearly, Momentum shows the more variable beta with the market: we had a -0.25 correlation on the overall simulation but the beta varies a lot, between positive values greater than 0.5, and go down to -0.5 at some periods. We can conclude two things: first, the beta of the factors are clearly time varying, as accounted by the literature; second it is important to control dynamically for this beta to recover purer returns of the factors.

To do this, we present also results with hedged returns. The hedging procedure of the signal is presented in the portfolio construction excerpt in page 62. This elementary hedging of the market is probably not perfect but appears to be quite effective. This is what we show on the right panel of Figure 5.8. The correlations are highly modified by this hedging procedure. The only correlation that remains between the market and a factor is with Size. This illustrates, again, the difficulty to get a proper Size factor. We keep a strong negative correlation between Value and Momentum and Value and Quality, a fact that is well known by practitioners. Moreover, Momentum and Quality share a strong correlation, around 30%, which is also a strong polarisation of factors that appears frequently. We add moreover in Figure 5.10 the performance of hedged factors, compared to their un-hedged versions (those un-hedged versions are simply, again the pink curves). Surprisingly enough, even if the correlations changed a lot as discussed, some factors are not changed a lot. Momentum, Value or Size are more or less unchanged, Quality is improved, and Low Vol is largely improved.

Finally, we plot in Figure 5.11 a computation of skewness estimation for each factor. We reproduce the procedure of Lemperi´ ere` et al. (2015) on daily returns of the hedged versions of the Long-Short imple- mentations of the factors. We rank each returns’ time series by absolute value, and plot the cumulative return serie where we include the sign of each return. As stated by the paper, this may be an alternative view of the skewness of the factor. We include in the legend for each factor a Maximum Likelihood estimation (MLE) of the skewness. This MLE is obtained as the estimator on a double exponential distribution for the series of returns. If the return data is well fitted by a probability distribution:   α α αˆxx0 αˆxx0 fˆx Še 1ˆx B x0  e 1ˆx A x0 α  α then the parameters α, α, x0 are estimated and the skewness of the estimated distribution is obtained. The main conclusion of this experiment is that as detailed in Section 2.5, measuring skewness is a

80 Figure 5.8: Correlations between factor strategies, hedged (right panel) and un-hedged from the market (left panel). Worldwide, 2004-2019.

Figure 5.9: Dynamic Beta of factor strategies with respect to the market. Worldwide, 2004-2019. difficult task! Part of the results are at odds with what as been said before in this section. What is coherent however, is that the risk premium interpretation of Momentum is valid: the MLE skewness is negative, and the highest returns in absolute value are negative returns. For Momentum, the picture seems clear. This fact was identified Lemperi´ ere` et al. (2015), as is the positive skewness of Value that we recover here, finding again an outlier behaviour of the Value factor. For Quality, the MLE value is slightly negative and the highest returns are also negative but with a very modest amplitude when compared to Momentum. So the risk interpretation is difficult because of the weak statistical significance of the figures. The little kink for Low Vol on the same graph has to be put in parallel with a positive MLE skewness which seems quite contradictory. For Size, Low Vol and Quality, the skewness reading of the risk premium interpretation is then statistically inconclusive. The most striking departure of this experiment when compared to the literature is the zero-skewness of the Size factor which is documented

81 Figure 5.10: Comparison of the performance Hedged and Un-hedged versions of Long-Short factors. Worldwide 2004-2019.

to be negative. This illustrates at least the lack of robustness of the implementation of the Size factor.

Figure 5.11: Skewness computation procedure of Lemperi´ ere` et al. (2015) for the various factors. In legend, the skewness value is given by Maximum Likelihood estimation.

82 Take-Home Message

• Value is anti-correlated with Momentum and Quality.

• Momentum is the most likelyZ to be a true risk premium.

• Quality is a top-performing factor.

• Size is very difficult to implement in a robust portfolio construction, and leads to weak results.

• Hedging the market beta is crucial, more generally portfolio construction is key.

83 84 Chapter 6

The Dark Side of Equity Factors

The marketing of factor-related strategies is terribly aggressive and we saw in Section 1.2.4 that billions of dollars are now invested backed by the risk premium paradigm. Of course, it is not all that simple. A lot of market participants (including in particular John Bogle, who can be considered as the father of the ETFs) and academics, do question the smartness of Smart Beta. The critics are numerous and cast a shadow on both the way the research is produced on the topic, and the effective performance of the related strategies. For instance, the point of Bogle is that it may be less costly and better performing to simply hold the market than playing Smart Beta strategies. In the mean time, Malkiel (2014) stresses the fact that the success of factor-related strategies relies more on recent marketing, as it is a trendy topic.

We will explore in this chapter the cornerstone problems of factor investing (in a large sense) that are more or less wrapped for instance in Arnott et al. (2019). We want to list the elementary questions that an investor should ask before being tempted to dive into factor investing. The main questions are the following. What is the probability that a factor is the result of pure data mining? We will describe the ex- plosion of attempts of factor discoveries in the recent years. What is the real performance of factors? We will see some general academic concerns in before studying with more details the strategies individually in Chapter 5. We will also look at implementation costs, as trading costs are often forgotten in academic studies. Finally we will conclude by studying the effects of crowding: crowding is a discussed effect, that we can hardly define, but may have tremendous consequences on investments made in relation to factors.

6.1 So many factors...

6.1.1 Number of factors and p-hacking The research on equity strategies and anomalies in a large sense has always been active and productive. As seen before, the works of Banz, Basu, Fama and French are such examples. But with the advent of factor-related investment, the attempts of factor discoveries have been steadily increasing. This research has always been the core activity of hedge funds and speculative boutiques. Yet, this research is of course unpublished and what changed is that those discoveries around anomalies started to be the result of the work of academics. The incentive then became to be the one that discovered a given factor. One may find sometimes the terms “publish or perish”.

The most famous paper on this topic is the work of Harvey et al. (2016) whose title is ironically “...and the cross section of expected returns” which is the sentence that most researchers use to promote their research around anomalies. In this paper, the authors compile the 314 factors published in top-ranked academic journals, and many attempts labelled as “newly discovered factors” submitted yet not accepted

85 into the same journals. The proportion of factors significant in simulation but not in real trading is tremendous. The paper got famous because of this finding: the competition for being published in a top-level journal is so high that there is an implicit incitement to produce significant and striking fig- ures. There is no glory in producing negative yet scientific results! We plot in Figure 6.1, based on the appendix of Harvey et al. (2016), the number of attempts of publications related to factors in those top-journals. We see that in the recent years, this number has been drastically increasing.

Figure 6.1: Number of factors published in reference academic journals as reported in the appendix of Harvey et al. (2016)

In the recent years, there has been a strong increase of the publications trying to discover and propose new factors. If a hedge fund may discover a market anomaly, the fund remains silent on it and simply trades it for some months or some years. Finding academically an anomaly and branding it as a factor is more subtle and requires more hurdles like described in Section 3.3. One of the hurdles is of course performance. As described before, a factor that is a risk premium should be the compensation for a risk taken, and this premium should materialize over the long term by a potentially modest, yet significant performance. In the same vein that Harvey et al. (2016), Hou et al. (2017b) tried to replicate 447 fac- tors or anomalies, finding that a proportion between 65% and 85% (depending on the level of expected significance) of them were simply not statistically significant, meaning that they should not have been published or promoted. Green et al. (2013) did the same exercise with 94 cross-sectional equity factors on the period from 1980 to 2014. Their work showed that only 12 out of 94 were significant. Their conclusion was, again, that the number of independent drivers of equity returns is small and limited, and that the amplitude of their returns is decreasing since the beginning of the 2000’s.

This attitude towards research ethics is generally called data snooping has been called p-hacking by Chordia et al. (2017). This relates to the attitude of researchers adapting their research environment and the context of their studies for their results to be over the significance threshold materialized by the p-

86 value. This is what is observed in the aforementioned studies: when reproduced in an other framework, the statistical significance of the results disappear. Chordia et al. (2017) generates nearly 2.1 million trading strategies and uses this huge set of strategies as a sandbox to gauge the effects of p-hacking in financial research. Their findings are in line with the previous words of caution: data snooping is a real problem and, after accounting for it, there are very few trading strategies that remain profitable. In fact, it is most probable that the research is effectively conducted on many strategies but only the profitable one are presented, those having a future in a top-level journal. In the end, over the 2 millions strategies, Chordia et al. (2017) keep only 17 strategies that “survive the statistical and economic thresholds”. Hou et al. (2017a) provides a study on 447 factors and the final conclusion is that two third of them fail to post a statistically significant performance as soon as the experimental setting excludes the 2 lowest percent of illiquid stocks. This shows again that backtest results may be very weak and may be due to a selection bias, with results presented being those with the highest potential, in spite of their potential robustness. A funny example is also given by Novy-Marx (2014), that highlight that one may used, in order to forecast expected returns of stocks, odd variables such as alignment of planets of weather reports in New-York. In this specific case, it is easy to tell that this is pure data mining. But in a more ambiguous context, finding for explanation some finance-related variables in the regression, how to be sure that those variables are not more relevant than planets or weather?

Main Bibliography

• Harvey et al. (2016)

• Hou et al. (2017b)

• Green et al. (2013)

• Chordia et al. (2017)

6.1.2 Alpha decay So there are too many so-called factors that are not, in the end, true factors. Depressing. An other problem is to evaluate whether the true ones remain profitable after they are published. This is the topic of an other very famous paper published by McLean and Pontiff (2016). In this now very famous con- tribution, the authors show that the performance of trading strategies decreases after their publication in research papers. As Hou et al. (2017b), they were unable to replicate the performance of 12 factors from an initial set of 97. More precisely, on this set, they compared the in-sample (before publication) and the out-of-sample (after publication) performance of those factors. The authors showed that after publication, the performance is divided by a factor of 2! They estimate that the factor premia are boosted by a factor of 26% because of data snooping. After a factor is discovered and published, the relative premium falls drastically.

In the same way, Chordia et al. (2014), Calluzzo et al. (2015), Linnainmaa and Roberts (2016), and Arnott et al. (2019) find the same decrease in the profits earned by anomaly-based trading over time, and roughly the same pessimistic coefficients. This phenomenon is called alpha decay. Calluzzo et al. (2015) in particular relates this after-publication decay with the trading of those anomalies by institutional in- vestors, highlighting the role of institutional investors, meaning that this is in the end, an improvement of the market efficiency. Alpha decay is a broader concept (the general fact that an anomaly, published or not, delivers less return as time passes by) but reflects here the fact that after its discovery, even a prof- itable and significant factor sees a drastic decline in performance. As an illustration, the January effect

87 Figure 6.2: Some random news on bloomberg.com on the performance of factors (March 2019). of Rozeff and Kinneys (1976) discovered in the 1970s is now documented to have disappeared. This is also at heart of differentiating true risk factors and market anomalies, whose returns may disappear over time as it becomes more and more traded.

Main Bibliography

• Green et al. (2013)

• Calluzzo et al. (2015)

• McLean and Pontiff (2016)

• Linnainmaa and Roberts (2016)

• Harvey et al. (2016)

• Hou et al. (2017b)

• Chordia et al. (2017)

6.2 Disappointing performance

An other question, linked partially to the notion of data mining, is the question of the effectiveness of the performance of factor strategies. As stated before, the compensation for the risk taken may be weak, but should be in any case, significant (see Chapter 4 on backtesting). As we can see in Figure 6.2, which is a snapshot of a famous financial news website, investors may be disappointed by the performance of

88 quant strategies, so that it translates by huge fund outflows.

For instance, Arnott et al. (2019) identifies that in the last 15 years, the performance of the most cele- brated factors has largely decreased and since then, no equity factor except the market among the big ones has been able to provide performance. Feng et al. (2019) confirms the weakness of the performance stability, by studying monthly 99 factors from 1980 to December 2016. A large portion of factors appear statistically not significant, with recently discovered factors appearing as largely redundant relative to previously identified factors.

Apart from the significance of performance, the other aspect is the timing and the identification of the source of performance. Indeed, in addition to their inconsistency, Malkiel (2014) finds that the perfor- mance of Smart Beta strategies is heavily depending on the economic conditions and used risk-budgeting procedures. In general, academics (see e.g. Novy-Marx and Velikov (2016)) find that factor-related strategies may have long-standing periods of strong negative performance. Moghtader (2017) under- lines the under-estimated hazards of long-standing and heavy drawdowns of factor strategies. This may be due to a clear non-Gaussian nature of the returns’ distribution (see Arnott et al. (2019)). In terms of expositions, some research has shown that some peculiarities of the factors may explain the perfor- mance. But when the peculiarities do repeat from one factor to an other, this is more problematic. In practice, many factors seem to be exposed to micro-capitalization stocks that may in some situations, drive the performance (see for this Malkiel (2014), Hou et al. (2017b)). As pointed out by Arnott et al. (2019), a risk management which would be not very careful when allocating between factors leads to a brittle diversification, and an exposition to negative tail risk that is in fact increased. This fact seems to be often forgotten by investors. Ignoring the propensity of factor portfolios to be exposed to tail events would be a major mistake.

We do not take much time to explore much the research around performance because we will explore it factor by factor in Section 5 and also because performance as intermittent it may be depends on the simulation framework, and the chosen portfolio construction. Listing all the related papers on the topic would be a bit pointless. See for instance Polk et al. (2019) for further reading.

Main Bibliography

We won’t spend much time on factor timing in this course. Mainly because the empirical results are unconclusive. Timing is the ability to forecast the absolute or relative performance of one factor over the other. This possibility is heavily discussed. Papers that try to trend on factor performance have to be found among e.g. Baker et al. (2011), Ilmanen et al. (2019), or Barroso and Santa-Clara (2015). Some other try to infer changes in the business cycle and in macro variables to time factors: Grant and Ahmerkamp (2013), Muijsson et al. (2015), Winkelmann et al. (2013). One may also refer to Blin et al. (2017b), Asness (2015) and Arnott et al. (2019).

6.3 Implementation Costs

A major criticism against factor strategies is that they generally have huge replication costs. Those costs are quite intuitive: passive-like investments are often presented as alternatives to market-capitalisation indices that are by nature the least costly equity portfolios. Following a factor-related strategy is, in practice, costly. The main question that Esakia et al. (2017) identifies is not the fact that trading a strat- egy may induce costs (this is always true and paper trading is never real trading), the question is rather

89 to know if after taking those costs into account, the net returns are still profitable.

More generally, estimating costs is a sophisticated topic, and on the notion of price impact estimation in itself, there is no clear consensus at this point. Several typologies of costs do exist (see e.g. Kissell and Glantz (2003)) and a clear way to understand costs is the to understand how orders are executed. The first method for an investor is to deal with over the counter orders. Price is directly negotiated with another market participant: the price that is finally paid to execute this order is known in advance. The second one is to delegate execution to a broker that can guarantee a defined execution price. The execu- tion risk is transferred by the client to the broker who bears the management and execution of the trades. The profit that the broker can generate is the difference between the price guaranteed by the broker to the client and the effective execution price the broker can achieve. The third possibility for the investor is to directly trade and execute on the markets (equivalent to the role played by the broker in the second case). This needs expertise and a heavy infrastructure. For this we refer to Hasbrouck (2007), Lehalle and Laruelle (2013), Foucault et al. (2013) or Gueant´ (2016) which give detailed insights on execution alternatives.

Main Bibliography

• Hasbrouck (2007)

• Lehalle and Laruelle (2013)

• Gueant´ (2016)

6.3.1 Estimating costs This paragraph applies in any equity context and does not rely on factor literature. Costs are related to real trading and strategy execution. There is an intuitive separation between direct and indirect costs. Direct costs represent a compensation for every participant involved in a trade but their estimation is not an issue. Direct costs include commissions, brokerage fees, settlement and clearing fees, exchange fees and taxes. In the case of Long-Short equity portfolios, there is also a shorting1 cost since for each stock each day, each broker issues a locate file that provides the maximal amount of shares that can be shorted this day with the associated cost of short-selling. In addition to the daily easy-to-borrow rate which is generally negotiated, the hard-to-borrow rate adds potentially an incremental layer of cost when the resource in the related stock becomes scarce, and consequently more expensive to sell short. Indirect costs are subtle to evaluate and measure. They represent a variable part of execution costs, which is not known in advance and represents usually the largest part of the execution cost for large trades. Bid-asks spreads are a specific example of such costs. They are linked to the liquidity of the market, depend on the asset and the exchange, are indeed indirect costs, have to be estimated, but have a low variability across time. Therefore, their estimation may lead to consider them as a nearly-systematic costs that are quite constant with time, even if technically unknown. Indirect costs are already materialized in prices and are of course not paid separately. In the following they will be understood as impact costs: the intuition is that the more we trade, the more impact we have on the market and the more costly it is.

However, after the trade, it is not possible to exactly observe the effective execution cost. Only assump- tions can be made, even in the post-trade analysis. Those costs have to be estimated through a functional

1See e.g. https://www.investopedia.com/terms/s/shortsale.asp for a full description of the shorting mechanisms.

90 form in order to be added in the optimization program for a rebalancing at date t, in prevision of the un- derlying costs affecting the gain that the trading will generate between t and t  1. Impact is not explicit and related to the (potential) activity of the investor on the market. For a given investor, able to execute orders herself, the most precise way to estimate impact is to keep all proprietary executed orders, and fit her or his own impact model on her or his own trades.

The idea is the following: when a market participant is buying, she or he consumes a part of the liq- uidity of the order book, then pushing the price up. Fabozzi et al. (2008) defines market impact costs as “the price an investor has to pay for obtaining liquidity”. The impact of a trade will depend on its size. Parametric academic models hopefully already exist, see e.g. Almgren et al. (2005), Engle and Ferstenberg (2007), Moro et al. (2009). We cannot provide here a review of micro-structure models, and let the reader read the in-depth review of models available in Bouchaud (2013), Lehalle and Laruelle (2013). A generic and easy-to-follow review of literature is done in the Chapter 3 of Ielpo et al. (2017).

This is the general picture for equity trades. There is a narrower yet wide literature dedicated to the estimation of costs inherent to factor implementation strategies. Two excellent papers tackle this issue: Patton and Weller (2019) and Briere` et al. (2019); both give an excellent overview of figures, data and approaches. The main approaches are well described by Patton and Weller (2019) that highlights the fact that the approaches that are used by academic are generally belonging to two broad categories. Those categories, without any surprise, represent the more general approaches detailed supra. The first approach is to use proprietary trading data to analyse costs for a single firm (like e.g. in Keim and Mad- havan (1997), Engle et al. (2012), Frazzini et al. (2018)). If they cannot be representative of the random trader, they provide at least “a lower bound on the implementation costs of factor strategies” (with the words of Patton and Weller (2019)). The second approach uses quotes and broad market-trading data to estimate trading costs for idiosyncratic stocks (see this time Lesmond et al. (2004), Korajczyk and Sadka (2004), Novy-Marx and Velikov (2016) for instance). In this case, the studies “extrapolate price impact estimates from small trades to large factor portfolios or ignore price impact costs entirely”. See? This is a complex task! And unfortunately the results are mixed.

6.3.2 The optimistic figures

Let’s start with the good news. Some contributions find that after all, trading factor-related strategies may be in the end profitable. In particular for Long Only Momentum portfolios: Korajczyk and Sadka (2004) finds realistic sizes for such funds between 2 and 5 billions US dollars (this being the capacity after which trading is no more profitable). Korajczyk and Sadka (2004) use TAQ data to infer effective spreads and impact functions from Lesmond et al. (2004). This contribution is important because it appeared quite early in the literature on factors. Two major contributions extend this analysis to other factors. Frazzini et al. (2018) use a proprietary data from a major investment firm (AQR), using data representing an aggregated amount of more than 1 trillion dollars of trades. The major equity anomalies Momentum, Value, and Size are under the radar and all appear to be implementable and may be scaled at capacity up to several tens of billions of US dollars, respectively 52, 83 and 103 billions for each. The associated trading costs are 270, 147 and 94 bp per trade for the very same strategies, for a Long-Short implementation. Finally, using effective bid ask spread based on daily closing prices, Novy-Marx and Velikov (2016) provide an equivalent contribution for a typical small investor, trading small amounts. For Momentum, Value and Size they find trading costs of 780, 60 and 48 bp per trade. The estimated ca- pacity is exactly equivalent to the one found by Korajczyk and Sadka (2004) for Momentum (5 billions US dollars), but find that an investor can scale up to 50 billions US dollars for Value (so in line with Frazzini et al. (2018)) and finally 170 billions for Size. It is most probable that this study overestimates

91 the true capacity since price impact is not really taken into account. But it gives at least an upper bound.

This overall picture is confirmed in the exhaustive review of Briere` et al. (2019), whose results (using trading costs estimates thanks to a public database of institutional executions called Ancerno) are more in line with those of Novy-Marx and Velikov (2016) (rather than with Frazzini et al. (2018)). Momentum appears to be the most expensive factor, followed by Value and Size. But the results are more optimistic since the estimated trading costs are only 40 % of the estimated costs of Novy-Marx and Velikov (2016). The overall conclusion is that Size, Value and Momentum are robust to trading costs as soon as the size of the portfolio is reasonable. Table 6.1 recalls the various figures found in the literature (for Briere` et al. (2019) we take here the results for the parametric estimation at 1 billion US Dollars).

Paper Size Value Momentum Korajczyk and Sadka (2004) - - 200 Novy-Marx and Velikov (2016) 48 60 780 Frazzini et al. (2018) 154 146 351 Briere` et al. (2019) 64 123 862

Table 6.1: Costs estimated in bp per trade in various papers.

6.3.3 The pessimistic figures

However, some academics are not on the same page. Initially, Keim (2003), using institutional trades from more than 30 investment firms, found that all the potential gains of the Momentum strategy were destroyed by trading costs. This fact is also supported by Lesmond et al. (2004) that draw the same conclusion. Stanzl et al. (2006) do the work of estimating idiosyncratic impact functions on more than 5000 stocks (yes!) and apply those formulas on the aforementioned three factor strategies, Momentum, Value and Size. Rather than billions, the authors find capacities around... millions, at best hundreds of millions for Size . Following this study, it is not realistic to invest in factor strategies. Plain and simple.

Two more references reach the same conclusion. Chen and Velikov (2017) examine 120 potential mar- ket anomalies and use high-frequency data (from ISSM and TAQ) to estimate costs. The results are that an average investor should expect at best small profits from those anomalies once costs are taken into account. In addition to costs, the authors found that half of the gains disappear simply after publication of a factor. The costs make the other half vanish. As a conclusion, the latest contribution of Patton and Weller (2019) provide results in the same vein. Using parametric microstructure models and real asset managers data, still on the same factors, the authors still confirm the tendency of Momentum to be less profitable in realistic trading when compared to theoretical gains. The conclusion is again that Momentum strategies harvest no profit for an average investor when various trading costs are taken into account. A striking conclusion of the paper is that “shorting frictions explain roughly half of mutual fund under-performance on momentum, and between one fifth and one third of under-performance on value.” An already presented fact states that Size is maybe the only factor strategy that has some potential, that can resist a bit to trading costs, unlike Value. This is paradoxical since as seen in Section 5.1.3, the return premium earned by the Size anomaly is in itself questionable. The anomaly should gain probably from its low turnover that affects hardly the potential returns. The general picture is not in favour of a realistic implementation of factor strategies, which is a real drawback.

92 Main Bibliography

• Beck et al. (2016)

• Patton and Weller (2019)

• Bucci et al. (2018)

• Frazzini et al. (2018)

• Novy-Marx and Velikov (2016)

• Briere` et al. (2019)

6.4 Crowding

Crowding is a very trendy topic but a nightmare for investors, or at least, they think that it is. We only see the bad consequences of crowding, which is lacking of a true definition, and is potentially unpre- dictable. Crowding describes market states where investors are tempted to invest in the same way, in the same stocks or securities, in particular in similar proportions. Said otherwise, everybody is doing the same thing. The problem is that the process is generally time-dependent: a factor is discovered, published, spread in the financial community. More and more investors try to replicate it, more and more money comes in the market along this strategy, and the factor becomes crowded, the mispricings that may appear are arbitraged away, decreasing the results of the “original” factor. “The anticipated arbitrage leads to disappointing returns.” (Arnott et al. (2019)). So, a factor is said to be crowded when it is so famous that everyone is playing it. A popular belief (and hope) in the market is that varying the implementations of factor investing should help to decrease crowding and potential trade correlation.

Why may it be a nightmare? Because if everyone sells at the same moment, the supply will overcome the demand, creating a dramatic market impact and tremendous losses, independently from true fundamen- tals of individual stocks and their underlying business sustainability. In a famous paper, Khandani and Lo (2011) document such an effect in 2007 when for exogenous reasons, unrelated to the stock market, many quantitative Long–Short hedge funds unwinded similar positions in the same securities, impacting highly the market in synchronized actions, leading to huge losses, unrelated to the idiosyncratic fun- damentals of stocks. Even without talking of extreme events, crowding is feared by investors because even a small daily crowding is said to decrease the performance of a factor, or said alternatively, sustain over-valuation, decreasing the possibility of future returns (see e.g. Cerniglia and Fabozzi (2018)). An interesting point is that there are few academic, cornerstone references on crowding, and all the research efforts are mainly done by practitioners.

6.4.1 General intuition

This notion is hard to define. Crowding, if there is any, interacts or has to deal also with transaction costs (either when the strategy is too expensive, or when a flow of cheap money is pushing artificially prices making that the returns are only the price being impacted by the other investors). The links that crowding shares with performance are ambiguous. In the way it is presented before, crowding is difficult to disentangle from alpha decay. But which crowding are we talking of? Crowding from strategies, from

93 portfolios or trades? This is a subtle question, and we will try to give more details.

It’s all about knowing if we fear a crowding that is static or dynamic. A “static” view of crowding would be more related to equivalent aggregated positions taken by investors. A “dynamic” view of crowding would account for synchronized, equivalent trades of various players.

First, let’s observe that the crowding of positions of the main market portfolio is not a problem. The market portfolio is endogenously driven by all trades and is liquid enough to be affected in average or in volatility by simple synchronized actions, different from strong market moves due to macroeconomic news. So crowded positions are more a concern for alternative strategies, either they are played long or Long-Short. Crowded positions are defined by Doole et al. (2015) when “there is a significant overlap of portfolio positions and allocations as a result of crowded trades which, in total, add up to a signif- icant share of a stock’s free-float market capitalization”. Crystal-clear! But what happens when a lot of positions are equivalent? In fact, nothing really frightening2. The only concern may be due to an exogenous shock on an other asset class or on a particular part of the portfolio makes that investors may have to liquidate quickly at the same time.

So what could be really puzzling should probably due to synchronized trades rather than crowding in positions, since trades are done in a way that investors try individually to focus and maintain a given exposure to a precise portfolio, replicating a factor. In a mild situation, synchronized trades increase the apparent impact of each investor, cumulating them to lead to an over-valuation of the valuation of stocks, decreasing future performance. This collective, unseen herding makes that every single investor underestimates the real impact of trades and the capacity of its very own strategy. However, as Doole et al. (2015) states it, “commonality in trading does not by itself necessarily point to crowding”. This is why this notion is sophisticated to apprehend. An interesting fact is that generally, pension funds and hedge funds end up with very different positions and that the correlation between a Long Only version and a Long-Short version of a same factor may be quite low. What is sure however, is that a massive, synchronized liquidation as in general dreadful consequences.

6.4.2 Some crowding measures

Crowding metrics are mainly attempts of measures. There is no real consensus on the way we can measure it, and for a given measure, how it may be possible to assess the effectiveness of this measure. Of course, data providers, investors, funds and regulators are keen on obtaining such measures, but it is more than likely that nobody on the market has a clear view on what is going on with crowding. Crowding metrics are generally separated into two categories: returns-based and holdings-based met- rics. Returns-based metrics can be computed in a timely manner, are easy to compute, and at a higher frequency. Holdings-based analysis are based on particular datasets: those datasets (mostly called 13F) sum-up the explicit positions in individual stocks of institutional holdings (pension funds, hedge funds, etc.). Those datasets provide a direct and explicit glimpse on (granular) institutional trades and positions of investors in the market.

2To be honest, a risk however exists when short positions are accumulated. If conditions are tightening and shorting becomes increasingly costly, it may be a problem for the investor. Conversely, if the stock’s price rises, as the increase may be unbounded, this may be a problem also for him. But this is not a problem of crowding. But it may appear that many players try to sell short without enough stocks to do so. In this case this is a short squeeze and the capacity of the portfolio that is played may be questioned.

94 6.4.2.1 Returns-based metrics

The first type of measures are returns-based metrics. They exploit the fact that we have at hand a strategy that gives an explicit composition of a traded portfolio. This means that this can be done at the investor level (with a proprietary portfolio) or in a conceptual way, but without an explicit disclosure of portfolios to other agents. This means in particular that a regulator or a market participant cannot compute a pre- cise measure of “a” factor. This is the main drawback of those measures: they capture a general activity on stocks’ trading, but this trading activity embeds the overall market activity, of all agents, regardless of the precise portfolio under study. So we will assume in the following that one has at hand a portfolio representing a factor, with a long leg L and a short one S.

A first set of dynamic metrics are based on correlation between stocks. This implies in particular that as correlation is slowly varying and needs some data to be computed, those measures are all quite slow. The idea is that the more a strategy is crowded, the more the stocks should co-move, whatever the direction. An increased correlation should be the sign of an increased crowdedness. Said otherwise, correlation tries to capture the average impact of trades that go in the same direction. So a first measure is the difference of the mean pairwise correlation on the long leg L minus the mean pairwise correlation of stocks of the short leg S (see for instance Lou and Polk (2012) or Cahan and Luo (2013)). A first observation is that this measure is ill-balanced since the measure may be unchanged whereas that the correlation increases in each leg. Moreover, do not forget that when the market goes down, stocks tend to be more correlated to others: this may pollute the interpretation of such a measure. Some alternatives use a pairwise correlation normalised by its dispersion. More advanced metrics involve a measure of the tail dependence, using copulas to quantify the probability of extreme negative co-movement, see Cahan and Luo (2013) again.

Other technical measures do exist, they are generally faster than correlation but their definition is subject to caution. Some practitioners use for instance the cumulative performance of a strategy, the difference of the number of stocks up minus the number of stocks that go down, or the number of stocks in the long leg that are close to their maximum on the last year minus the number of stocks on the short leg that are close to their minimum. As simple as it may seem, we do not believe however that those measures are realistic crowding measures. They are more or less monitoring recipes that may be insightful on the portfolio but that are too far from any theory to model factor crowding.

6.4.2.2 Holding-based metrics

Holdings-based metrics are the most interesting since they are deeply-rooted in data that helps to try to define crowding. The idea behind those measures is straightforward: stocks with excess, or massive investments with respect to their market capitalisation, whether it is long or short, and with in particular with a huge correlation of changes in those weights, may be over-crowded. The metrics are in general related to buying power and holding concentration. This needs to have access to the institutional in- vestors’ holdings and aim to measure the changes in ownership structure. Buying power is measured by the number of funds buying the stock divided by the sum of the number of funds buying plus the number of funds selling. It is necessary to already hold the stock to be taken into account in the measure. What is computed is the median over the long leg L minus the median over the short leg S.

Alternatively, the concentration for a stock i is defined as:

Nfunds 2 ci Q si,k i 1

95 where si,k is the proportion of share of stock i held by fund k. Here again, what is computed afterwards is the the mean of the concentrations over the long leg, minus the mean concentration over the short leg. An other measure called entropy measures the same quantity by replacing in the expression of ci, the squared share proportion s2 by logˆs.

In the academic literature, Greenwood and Thesmar (2011) uses mutual fund flows data and computes a ‘fragility” measure, that tries to proxy a similarity both in fund flows and in the concentration of portfolios’ holdings. It uses in particular a covariance matrix of trades, scaled by ownership. The measure is up when stock ownership tends to concentrate. Some years later, academics started to use short-interest data on institutional holdings, see e.g. Cahan and Luo (2013) or Hanson and Sunderam (2013). Let’s recall that the correlation between a Long Only and a Long-Short version of a same factor is sometimes very poor. This means that those holding-based metrics tell only a part of the story since those data do not include short positions. The problem in general is that even if holding-based metrics are often promoted as being the “purest crowding source” by practitioners, holdings may be not complete, biased and potentially delayed. Indeed, holdings data have a refresh frequency that is low (every three months in general). They are obviously published less frequently than returns data! Moreover, many funds, whose assets under management are below a threshold (100 Millions of US Dollars in general) do not appear in the data, and the short leg is not always available.

6.4.2.3 Other measures

Other measures may involve other kind of data. In particular, some measures use flows (from funds or ETFs) , that is the amount of money going from one fund to the other, or short-interest data. Let’s recall that shorting comes at a cost as detailed supra in page 90. This is very important since limits in shorting is maybe one side of crowding. The drawback of using short-interest data is that it is mostly static and does not account for investors’ movements or intentions to trade in one direction or the other. Yet it may constitute a precise snapshot on what is happening on the short legs of portfolios.

The main quantities that are looked at are naturally the cost of shorting and the utilisation. Shorting-cost is simply measured by the hard-to-borrow cost times the relative positions of the portfolio. The higher it is, the higher the costs for the investors. If she liquidates her positions, we would need an advanced indicator of crowding. It would be also the sign that the demand for the stocks is increasing and that the supply of stocks to short is scarce and that the exploitation of the current strategy is limited. This aspect is also represented by utilisation which is often used in itself as a crowding measure. Generally, the utilisation averaged on the extreme decile is a common measure. The more the stocks are utilized, on the short leg, the higher the demand for them, and the higher the probability that the factor is crowded. Variations include regression of the short interest on the quantiles of usual factors. Of course those measures cannot account for an increasing long demand for other stocks. They are asymmetric yet very useful since based on precise data, closely related to the topic under study. An other (not so intuitive) drawback is that on some countries, short supply become very scarce for fiscal reasons at dividend dates. This depletion of the supply is totally artificial and unrelated to crowding but it may heavily periodically bias the measure.

6.4.3 The particular case of the value spread A particular measure is linked to the value spread3 since it is at the heart of a controversy between some active researchers. The debate occurs in particular between two researchers, Rob Arnott and Cliff Asness (the latest being at the head of the asset manager called AQR). Asness advocates to measure

3Book Value is defined in Section 5.1.1.

96 the expensiveness of a given factor with this value (or valuation) spread with various metrics, including price, book value, and sales. It is in its simplest form, measured as the difference of (Book Value over Market Value) on the long leg of the factor minus (Book Value over Market Value) on the short leg. Of course it is initially defined on the Value factor, computed on Value stocks minus growth stocks. One difference between Arnott and Asness is that Asness advises to vary the metrics and replace also Book Value with income variables such as Sales intensity. It is a static metric, that may be slow to change. The general idea is that the more expensive it is, the more crowded it is.

Asness (2015) argues that there are several ways to define the value spread for a given strategy. Accord- ing to the paper, the value spread should be conceived as a cheapness measure, rather than a crowding measure. Indeed for a value strategy, the value spread is always cheaper on the long leg. Moreover the value spread changes over time and if there is a buying pressure on the long leg and and a selling pressure on the short side, mechanically, the value spread should narrow.

The particular status of this measure is that for anomalies that are too difficult to justify theoretically, the argument is that those strategies are “unanchored” (Doole et al. (2015)) and that the returns may appear as over-valuated because the anomaly is in fact driven by the variation of stock’s valuation from its true fundamental level. The argument is therefore to look at the relative valuation spreads of the buckets. ”Highly levered companies became much cheaper than their lower-levered counterparts, as investors shunned Leverage” (Doole et al. (2015) again).

The point of both researchers is to underline that if the strong performance of a given factor comes from a strong, idiosyncratic alpha, this is a good sign that the factor will continue to deliver positive performance in the future. But this performance may in fact be due to artefacts in the valuation of the factor. As Arnott et al. (2019) states it : “the backtest of the candidate factor might look impressive if it begins when it has low valuation levels and ends when it has high valuation levels, a point made by both Fama and French (1992) and Arnott and Bernstein (2002)”. This surely impacts the forward-looking premium, as Arnott et al. (2016) argue”. In other terms, many factors may just deliver performance because they are more expensive (in terms of the valuation spread). Arnott therefore advocates to measure the structural alpha of the factor by subtracting to the factor’s returns the returns of the valuation spread. Asness goes one step further by defining structural alpha of the factor by subtracting to the factor’s returns the returns of the valuation spread times the beta of the factor. They also disagree on an other point: Arnott thinks that in order to time factor, trending on their performance is a bad idea yet building a contrarian strategy based on the valuation spread may be a solution. Asness disagrees and does not see valuation spread as being an advanced variable useful for factor timing.

97 PART II

EMPIRICAL INSIGHTSON EQUITIES

Stylized facts and heuristics of equity markets to understand them.

98 Chapter 7

Stylized Facts on Equities

7.1 What are stylized facts?

With a variety of asset classes with different liquidity, constraints and nature, speaking in general of financial returns is rather difficult. No statistical pattern appears with certainty and perfect regularity in the data. Yet, it is known that financial returns show some stylized facts. Stylized facts are statistical patterns that tend to repeat in the data, for different or specific financial instruments (stocks, indices, etc.) and markets, frequently but without certainty as they may be unobserved in some periods or under some extreme market conditions. Mandelbrot (1963) and Fama (1965) were the first to empirically question the Gaussian random walk hypothesis for prices, bringing to light various statistical properties of asset returns. Their studies paved the way to intensive empirical works trying to exhibit statistical regularities common across a wide range of financial datasets. See exhaustive discussions in references such that Cont (2001) or Terasvirta¨ and Zhao (2011). Cont (2001) in particular clearly states that there is a trade-off between the potential universality of the qualification, and the quantitative precision, when characterizing stylized facts.

Understanding the stylized facts of financial returns is crucial since it makes it possible to build time series models that can be used to generate reliable returns’ models to be used afterwards. First, such models should be used for risk control. Regulators are increasingly asking investment managers to have a better control on their risk modelling and on their potential losses. Investors are also very keen on evaluating the risk of a strategy, with metrics such as the Value-at-Risk or . Second, “risk-based” investing has become a significant trend in the industry: this investment approach combines assets in a portfolio so that they all contribute to the portfolio’s risk the same way. Risk based investing heavily relies on risk estimates. Again, thanks to the previously mentioned combination of volatility models and distributions, computing such measures is easy once the models’ parameters have been esti- mated.

Nevertheless, as the focus of this course is to focus on equity factors, we will only explore stylized facts on stock returns and stock markets. The aforementioned references may help to discover the literature dealing with stylized facts on other asset classes.

99 7.2 Stylized facts on stock returns

7.2.1 Returns

7.2.1.1 Autocorrelations

A first stylized fact is that financial returns are known to be non-Gaussian. The main drawback of the Gaussian assumption for returns is that it does not, in general, give a precise description of the tails of the return distribution. The non-Gaussianity of returns is a widely studied: it is for instance the main object of the book of Jondeau et al. (2007) but other stylized facts are particularly striking. We recall here some of the facts already listed in Cont (2001).

Financial returns show for instance a low degree of linear correlation (namely, autocorrelation) when computed on the returns (on a non intra-day time scale). It changes if autocorrelation is computed on a non-linear transformation of returns. The same returns, when squared or taken as their absolute value, present an autocorrelation that is, on the contrary, slowly decaying with time. This is what we observe on Figure 7.1 that plots the autocorrelation of daily arithmetic returns, up to a 20-days lag (x-axis), of the S&P500 index, on the period 1970-2019. In y-axis we find the value of the autocorrelation. We see (top) that the raw returns show no significant autocorrelation, whereas is is very steady when a linear transformation is applied, whether it is in absolute value (middle) or squared value (bottom). Of course, empirical properties of returns distribution change dramatically with the time scale definition. See in particular Chapter 2 of Bouchaud and Potters (2009). For stocks, a Gaussian assumption appear less and less inaccurate as the time scale increases: for monthly returns, the Gaussian modelling may be plausible; but at intra-day scale, 5-minute returns show empirical distributions that are really far from a Gaussian. Conversely, some linear correlation in raw return series tend to appear as the time scale increases up to monthly scales (Cont (2001) again).

The PACF is an other interesting tool. A formal presentation of PACF would be beyond the scope of this course. Any reader interested in a formal presentation should find some material in any good time series book, such as in Box et al. (2015). In short, PACF is very similar to the autocorrelation function presented earlier, but when correlating, say, a variable at time t with its past value at time t  k for example, it removes the influence of the lags in between, all the lags @ k. An autoregressive model typically displays one single non-zero value on its PACF for a lag of 1, and zero elsewhere. Figure 7.2 shows such estimates in the case of the S&P500 daily returns, applied on raw returns (left) or on squared returns (right). The pattern is again very different for raw returns and for squared returns. As squared returns are used for computing volatility, this gives the insight that longer lags should be included in any modelling of volatility.

Main Bibliography

• Bouchaud and Potters (2009)

• Box et al. (2015)

• Ielpo et al. (2017)

100 Figure 7.1: Autocorrelogram for daily S&P500 returns over the period 1970-2019. Top: raw returns. Middle: absolute returns. Bottom: squared returns. X-axis is the number of days to compute the autocorrelation. Y-axis is the value of the autocorrelation.

7.2.2 Volatility

7.2.2.1 Stylized facts

The time varying dimension of stock returns is usually described using their moments: the empirical finance literature has long been diagnosing the fact that the conditional distribution of returns is time varying. Clearly, returns are not stationary (see Section 2.3 for a definition), as at least their historical volatility exhibits time variation. This lack of stationarity is also suspected to explain part of the known stylized facts about returns: the “fat tails” of financial returns’ distributions can be reproduced using conditional distributions with thinner tails combined with a stochastic volatility model.

Unfortunately, volatility is the only moment-related characteristic of financial returns that exhibits a measurable persistence. Expected returns - the first moment of returns’ distribution - can not display persistence by essence. Should there be persistence, then returns should show predictability based on their past. If a strong absolute correlation appears in returns, it would be easy to forecast future returns with a simple contrarian strategy in case of negative autocorrelation (selling past winners and buying past losers) or a trend following strategy in case of positive autocorrelation (selling past losers and buy- ing past winners). Such a pattern cannot be systematically present: investors would use this feature until it disappears.

The investigation of rolling estimates of volatility clearly shows that volatility is time dependent and that returns are therefore non-stationary. An example of such a pattern is shown on Figure 7.3. Episodes such as the 1987 crash and the 2008 crisis typically display heightened levels of volatility, that contrast

101 Figure 7.2: Partial autocorrelation of S&P500 returns over the period 1970-2019. Left: raw returns. Right: squared returns. X-axis is the number of days to compute the autocorrelation. Y-axis is the value of the partial autocorrelation..

with long-lasting periods of calmer markets. Similar patterns can be found across various types of risky assets. Volatility is time varying but exhibits persistence: when volatility is high or low, it remains so for a sustained period as it is now obvious from Figure 7.3. This phenomenon known as volatility clus- tering (first identified by Mandelbrot (1963) – but see also Lo (1991), Ding et al. (1993), Giardina and Bouchaud (2003), or Cont (2007)) is usually isolated by representing the autocorrelation function of squared returns. Figure 7.1 displays the autocorrelogram – that is the structure of the γˆh (as described in Section 2.3) as a function of h ranging from 0 to 40 days – for the returns on the S&P500 over the 1987-2016 period. This absence of autocorrelation is in line with the efficient market hypothesis. Ab- solute returns are a proxy for volatility and square returns are a proxy for variance. They both have a very different autocorrelation structure. The pattern is more striking for absolute returns, as the autocor- relation function decays very slowly towards zero. A lot of volatility models describe the next volatility level as a function of the past volatility level for this precise reason.

An other famous stylized fact is known under the name of leverage effect (even if this phenomenon receives different designation in the financial economics literature). In the case of equity markets, there are empirical evidence that negative returns lead to a stronger surge in volatility than positive ones. It is sometimes also called returns-to-volatility feedback effect, as the explanation of the phenomenon may come for market participants’ risk aversion. Market participants being on average long equities, their risk lies on the downside of the market. A drop in equities’ prices can lead to a panic move, creating a burst in volatility. This translates the fact there is a negative relation between past returns and the subsequent returns’ volatility. We illustrate this effect in Figure 7.4 where we plot a 60-days volatility against a lagged (with a month lag) 60-days averaged return. The anti-correlation between the two quantities is striking (we indicate also the regression coefficient of the volatility on past returns). Clearly, we see that negative returns are followed one month later by a higher volatility. This fact in in particular documented

102 Figure 7.3: Rolling 20-day estimates of the annualized volatility of the S&P500 using daily data over the 1987-2015 period. by Schwert (1989), Nelson (1991), Glosten et al. (1992) or Braun et al. (1995).

Figure 7.4: Illustration of the leverage effect. X-axis: rolling, 60-days average returns with a 20 days lag. Y-axis: rolling, 60-days annualized volatility. S&P500 daily returns over the 1987-2015 period.

103 7.2.2.2 Volatility models There are many types of models that may potentially be used for the purpose of modelling volatility, but most of them share the very same origin: GARCH models. Engle and Bollerslev (1986) propose the following dynamics for variance:

2  2 2  2 σt ω ασt1ηt1 βσt1, (7.1) where ω, α and β are usually assumed to be real-valued parameters so that the variance dynamics remains positive through any time series. It is important to stress that in this paragraph α and β have nothing in common with the usual notions of alpha and beta coming from CAPM and exposed supra. We keep those notations as they are very common also for GARCH-like models. The variance dynamics is stationary if these three parameters fulfil the following constraint:

Sα  βS @ 1. (7.2)

The unconditional variance of the process is therefore ω E ¡σ2¦ . (7.3) t 1  α  β

Given that the model presented here only includes one lag for both σt and ηt the model is usually re- ferred to as a GARCH(1,1) model. The lag specification used to be a significant empirical question, but the decent performances of the GARCH(1,1) to model financial returns’ volatility let many to adopt a (1,1) structure. One of the main advantage of such a choice is that the estimation is considerably eased. E ¡ 2¦ Replacing σt by an empirical estimate of the quantity creates a constraint on one of the parameters (usually ω), letting the user with only two parameters (usually α and β). These two parameters can easily be estimated using a two-dimensional grid search.

7.2.3 Skewness Financial returns display a non-zero skewness: in the most general case, depending on the type of assets, the skewness can either be positive or negative, depending on which of the returns’ distribution’s tail is the thickest. A common reading is that a thick tail indicates the presence of jumps. When an asset has a tendency to be impacted by negative jumps – i.e. large negative returns that happen rarely – then its distribution should display a negative skewness. There is an important remark to make. An apparent negative skewness is not always due to a left-tail asymmetry. Leverage effect may create an asymmetry in the distribution of returns, as negative returns have a tendency to be larger in absolute terms than positive ones. It is therefore complex to disentangle the asymmetry coming from leverage effects and the asymmetry coming from negative or positive jumps.

One argument to dismiss the Gaussian nature of returns is related to empirical measures of skewness and kurtosis of returns. As for a Gaussian distribution we have k 3, one will often find corrected values such as k is implicitly shifted to 0 as a deviation from the Gaussian law. “Excess kurtosis” is equal to the k  3 quantity. Stocks returns have in general a negative skewness and a pronounced excess kurtosis (see for instance Jorion (1988), Bates (1996), Hwang and Satchell (1999), Harvey and Siddiqui (1999), Harvey and Siddiqui (2000) among others). But this is not always true, and the skewness of stocks is in general modest in amplitude. This is illustrated by Figure 7.5 where we plot the distribution of the unconditional skewness of the daily returns of each S&P500 components over the period 2002- 2018. This distribution is in blue, we see that the skewness is a distributed variable with no clear sign. However, we plot in red the value of the unconditional skewness of the index returns on the same period.

104 The value is negative. This is mainly due to the fact that stocks tend to correlate in bad times, explaining why the negative returns are amplified at the index level, pushing the skewness of the index to the negative values.

Figure 7.5: Unconditional skewness of the daily returns of the components S&P500 components on the 2002-2018 period. The distribution of the skewness of stocks is in blue, one point for each stock. The equivalent value for the index is in red.

105 106 Chapter 8

Markets’ Heuristics

8.1 Objectives of this chapter

8.1.1 Motivations

This chapter focuses on empirical figures and data only. We won’t use much textbook analysis or refined statistics here, and we want to put the stress on intuition and very general facts. Indeed, working with equities, it is worth to have in mind some heuristics concerning equity markets. We won’t use cutting edge estimators for volatility, beta or correlation. Our raw material here will be mainly stock returns, more precisely daily total returns (both for stocks and indices), meaning that those returns are computed thanks to price increments and dividends (see Paragraph 2.1.1). We want to give here a flavour of the main quantities that are often manipulated when we build factors, and understand for instance, a newsfeed like illustrated by Figure 8.1. What is the correlation between markets? What is the correlation between stocks? How can we say that the level of risk of a market is high or not? What are the links between the beta, and quantities between liquidity, market capitalisation, etc.? This is the aim of this short chapter since equity factors and anomalies are often based on such simple quantities. This is also important to understand what is common among markets and what is more geographically dependent.

Figure 8.1: Apple Inc. - Source : Yahoo on 3rd of July 2019.

107 8.1.2 Computations and numerical examples This chapter is then to be considered as a big numerical example. We will look at 2 geographical zones in particular, US and Europe. We will optionally include to this study three other zones (Japan, Canada and Australia) for an illustration purpose. Let’s recall that we use exactly the same pool description than in Section 5. As a shortcut, for gaining space in figures, UL will stand for the US zone, EL will stand for Europe, JL for Japan, CL for Canada and PL for Australia. With this in mind, we plot in Figure 8.2 the aggregated market capitalisations by zone in US dollars. The representation is in log scale. US is clearly the biggest pool, followed by Europe, Australia being the tiniest.

Figure 8.2: Aggregated market capitalization (in US dollars) of the selected stocks by zone, across time.

Let’s define here the raw quantities we compute in the chapter. At some stage, we will adapt the quan- tities (and use median, mean, rolling version, etc) for sake of clarity of the presentation. They will be studied from the 1st of January 2000 up to the 1st of January 2019.

• “Market” or index: we take one representative index for each geographical zone. Then, the returns of the chosen index are lazily assimilated to be the “returns of the market” and a proxy of its state, even if the stocks of our pool are or are not part of the index composition. We choose for the US zone the S&P500 index; for the Euro zone the BE500 index; for Australia we consider the S&P/ASX200 index; for Canada we consider the S&P/TSX index; finally for Japan, we choose the Nikkei 225 index. This choice is totally arbitrary yet allows to get realistic results for our exercise. • Volatility: unless precised otherwise, we compute volatility as being the annualized standard type of returns. • Correlation: unless precised otherwise, correlation is computed as the Pearson correlation coef- ficient on weekly aggregated returns; computing correlation by aggregating on a week allows to

108 take into account short term correlation effects of returns.

• Beta: unless precised otherwise, we compute beta as the rolling ratio of covariance of the returns with the benchmark divided by the variance of the benchmark’s returns (this is equivalent to a zero-intercept OLS estimate of a simple regression);

• Market Capitalisation: product of the share price times the number of outstanding shares, ex- pressed in US Dollars with dynamic fx.

• Turnover: average daily liquidity in US Dollars with dynamic fx.

8.2 Risk

First, let’s explore some risk heuristics. We take here some simple risk proxies that are volatility, beta and correlation. These are simple quantities that do not involve a complex modelling. Yet it is important to have such figures in mind.

8.2.1 Volatility 8.2.1.1 Volatility at the market level The level of volatility is an early indicator of the average level of risk in the market. We plot in Figure 8.3 the volatility of the market on the geographical zones, computed as the annualized 250-days rolling standard deviation of daily returns of the index chosen to represent each zone as defined in Section 8.1.2.

Typically, here, the returns have been aggregated into weekly returns to account for the autocorrelation of returns. A typical value for an equity market is between 15 and 20 percent. Of course there is some variability reaching low values around 10 percent, for instance, before the 2008 financial crisis, or in 2017. Due to the rolling-estimation, the estimated volatility due to the financial crisis is materialized in high values estimated for 2009 and 2010. During this period, the perceived level of risk is 2 or three times higher than the common value, spiking up to 40-45 percent. We see that during the crisis, all the zones spike at the same moment, that Canada, Australia and US are well aligned since 2018, and that the average risk in Japan is generally higher.

8.2.1.2 Volatility at the stock level Let’s look at the volatility at the stock level. We have seen that the level of volatility can be highly vary- ing in time for the market. Knowing volatility helps for instance to understand the Low Vol anomaly and the Low Beta anomaly. Here we choose to focus on a given period of time to avoid to estimate quantities on periods of time so long that a stock can change drastically between the beginning and the end of the period. For an illustrative purpose, we focus on a 2 years period, from the 1st of January 2017 to the 1st of January 2019. This is a totally arbitrary choice.

The distribution of stock volatility is represented in Figure 8.4: the plot is drawn with the distribution of the volatility for the two biggest zones, US and Europe. Here we compute volatility in a specific way, in order to have one figure of volatility for one stock present at least one year in the pool. So one stock accounts for one point in the histogram. This value for one stock is the median of the time series obtained as the annualized 250-days rolling standard deviation of daily stock returns for each stock (roughly 500 points by stock since computed on a 2 years period, from the 1st of January 2017 to the 1st of January 2019). We trim aberrant values to get figures lower than one. We clearly see that the average level and the mode of stock volatility is higher than the average value of the benchmark on the same

109 Figure 8.3: Market volatility by zone on the 5 geographical zones. Volatility is computed as an annual- ized 250-days rolling standard deviation of daily returns for each chosen index. zone. Values, and shape of the distribution on Europe and US are however quite similar. The yearly volatility for a stock is easily around 25 or 30 percent rather than around 15 or 20 percent for the whole market, as represented by the index. This is explained by the fact that the index is built by aggregation of individual stocks which explains that there is a diversification effect that makes that the aggregated index has a lower volatility. We add a complementary table, Table 8.1 that presents the quantiles of the distribution of each zone. In particular the quantities for US and Europe in the table are those of the distributions on the figure. We can observe that the smaller the zone, then the higher the probability to have small stocks, then the higher the quantile of high volatilities. Small stocks, with small nominal prices, are generally associated to higher volatility as we will see below.

US Europe Japan Canada Australia Volatility: Quantile = 0.25 22 22 25 22 27 Volatility: Median 29 28 29 34 41 Volatility: Quantile = 0.75 42 36 34 54 65

Table 8.1: Quantiles of stock volatility distribution by zone. Annualized, in percent. 2017-2019.

8.2.2 Betas It is also important to have in mind regarding the values of the betas. It is useful when studying the Low Beta anomaly, obviously. It is commonly thought that beta is naturally around 1. Yes, it is the good order of magnitude since a beta is commonly between 0 and 2, yet it may have some variability.

First, let’s recall of course that a beta needs to be estimated. For this we again focus on the 2 years

110 Figure 8.4: Distribution of stock volatility (Left: US - Right: Europe). period, from the 1st of January 2017 to the 1st of January 2019. We compute for a stock one β on this whole period as being the beta of the regression (with an intercept) of the daily stock returns on the daily returns of the index representing the zone as detailed supra. The distribution of the computed betas are represented in Figure 8.5 for the two biggest zones, US and Europe. Table 8.2 represents the quantiles of those quantities on all the zones. We see that generally, beta is positive, centered around 1. The distribution is quite symmetric in the US, but slightly skewed to the left for Europe. When a stock has a beta lower than one, and even negative, it is a strong diversifier that may help to blend the residual beta of a portfolio. This variability of the beta is important for portfolio construction, since not controlling the beta of a portfolio or making simplistic assumptions on the beta distribution could lead to disastrous effects in the portfolio exposition. An important illustration is e.g. given in Ciliberti et al. (2019) and more precisely on Exhibit 3, top panel, of this paper. In this graph, β is a weird, non-monotonic function depending on the market capitalisation. Each long-short portfolio split and ranked along market capital- ization may exhibit a natural β bias whose size and amplitude depends on its cardinality.

US Europe Japan Canada Australia β: Quantile = 0.25 .73 .41 .55 .54 .56 β: Median .96 .65 .70 .79 .77 β: Quantile = 0.75 1.23 .91 .87 1.06 .99

Table 8.2: Quantiles of stock β distribution by zone.

8.3 Correlations

For a portfolio defined on a given geographical zone, diversification is obtained when mixing stocks with various correlations, even if those stocks are not all decorrelated. This is one message from the theory of Markowitz. We have already seen supra that this explains that benchmarks or indices have in general a lower risk than the idiosyncratic risk that individual stocks bear in general for a large part of their distribution.

111 Figure 8.5: Distribution of stock volatility (Left: US - Right: Europe).

However geographical diversification is an other effect when building a worldwide portfolio. Markets across countries show a variable correlation through time. They are however quite correlated. Figure 8.6 presents the correlation between the geographical zones through the correlation between indices’ returns. Those correlations are the Pearson correlation coefficient between index returns aggregated weekly on the whole period from the 1st of January 2000 up to the 1st of January 2019. Those are then unconditional correlations, meaning computed on the whole period. We see that the typical correlation between two geographical zone, is between 50 and 80 percent. The typical value is around 60 percent. In particular, US and Europe are a bit more correlated since as the major economies in the world, they are tightly linked through macroeconomic channels. Canada, for historical and geographical reasons, is tightly linked to the US, which explains the strength of their correlation. Australia (PL) is a bit less correlated with others, which is not surprising since Australia is known to be a polarized market, mainly driven by few sectors like mining.

Of course those quantities vary in time. This is shown by Figure 8.7. This figure is, intuitively, the dynamic version of the average of the off-diagonal terms of the matrix of Figure 8.6. We estimate the one year, rolling correlation between zones based on indices’ weekly returns, and average the values of the off-diagonal terms of the correlation matrix. This is represented in green on the Figure. We plot in red a smoothed version of this curve with a 2 years exponentially weighted moving average. We recover a median level of correlation around 50, 60 % but we see that this value is varying in time, spiking in crisis times, which is a known stylized fact.

This variability in time is also shown by Figure 8.8 that shows the box-plot representation of the distri- bution of 1Y-correlation for each possible couple of zones in our experiment. To draw this plot, for each couple of zones, we compute the correlation of indices’ returns: we compute this correlation in a rolling fashion, with trailing one year weekly aggregated returns. This gives us for each couple 52 (weeks) times 19 (years) points of correlation for each couple, and we represent the distribution as a box-plot. The red dot is the value of the 1Y-correlation at the end of the sample (on the 1st of January 2019). We recover the same kind of messages (variability and order of magnitudes).

112 Figure 8.6: Unconditional Pearson correlation of between markets. Weekly returns, period: 2000-2019.

Figure 8.7: Rolling correlation between markets. One year rolling correlation of weekly returns between indices (green) and its 2 year smoothed average (red).

113 Figure 8.8: Distribution of rolling 1Y-correlation on weekly returns between zones represented as box- plot. The red dot is the correlation on the 1st of January 2019.

8.4 Stock size

8.4.1 Market capitalization We provide here descriptive figures on two quantities that are proxies of stock size: market capitalization and turnover. Market capitalization is the product of outstanding shares times price per share, then it is the total available amount of equity on the market. The notion of shares is subtle (outstanding or not, being possible to buy, include a notion of vote, etc.) but we do not enter into the details here. Market capitalization may be viewed as the actual view of the market of the future cash flows of the company, dividends included. Of course, it represents a subjective view of the market of its future projects and perspectives. Not all the market capitalization is available to sell at any moment. It is however the most common view of the public estimation of the size of a company. It should reflect also in a way, the price that the common shareholders would get if the company was about to close its activity and the assets to be liquidated, even if this is not true (it should be however of the same order of magnitude).

Figure 8.9 presents the distribution of market capitalization in the US and in Europe. We have already seen earlier in Figure 8.2 that the aggregated amount was relatively higher in the US. Not only be- cause it is the biggest pool of our experiment: the biggest stocks are generally to be found in the US. Without entering in the fine details of dual quotation, the biggest stocks in the US (Apple, Microsoft, Google-Alphabet, Amazon, Facebook and... Ali Baba, originally however a Chinese company!) have a market capitalization between 500 and 1000 billions US dollars. In Europe (typically Nestle, Novartis or HSBC) the top market capitalisations are between 300 and 700 billions US Dollars. The quantiles of the distribution on each zone are given in Table 8.3. Market capitalization that is studied here is again given under the form of one value for each stock. The value for one stock is the median of the time series made of the 250 days rolling median on the two years period 2017-2019 (500 points). Here we see that the

114 Figure 8.9: Distribution of market capitalization (US Dollars - Left: US - Right: Europe). distribution of the market capitalization is exponential, easily represented parametrically by a power-law (see Gabaix (2008)), which is a very famous empirical fact. This means that the top capitalisations are even extremely higher in the highest quantile.

US Europe Japan Canada Australia Market Cap.: Quantile = 0.25 10 0.4 0.4 0.4 0.04 Market Cap.: Median 3 1.5 1.0 1.2 0.3 Market Cap.: Quantile = 0.75 10 5.8 3.5 3.2 1.4 Market Cap.: Maximum 800 627 185 110 110

Table 8.3: Quantiles (rounded values) of market capitalization distribution by zone. In Billions of US Dollars. One figure by stock as a median on a 2-years period: 2017-2019.

8.4.2 Turnover We are interested here by the turnover i.e. the amount traded each day in dollars. This quantity is im- portant since its define the level of activity of the concerned stock and is also a proxy of the size of the company. To get a hint on why this quantity is important, one may refer to Ciliberti et al. (2019) as described in Section 5.1.3. This quantity is obtained as the product of the number of traded shares (changing hands) times the price of each share. As the quantity is price-dependent, it is very sensitive to price movements in value through time. With also spikes in volatility, it is then common to consider medians through time of this quantity to get a smooth value.

Figure 8.10 presents the distribution turnover in Europe and in the US. The quantiles of the distribution of turnover on each zone are given in Table 8.4. Turnover here is again given under the form of one value for each stock. The value for one stock is the median of the time series made of the 250 days rolling median on the two years period 2017-2019 (500 points). Wee see here again that the distribution of turnover is exponential, with the maximum values being far different from the 75% quantile (see again Ciliberti et al. (2019)). As we will see below, market capitalisation are tightly linked, but the raw

115 Figure 8.10: Distribution of daily turnover (US Dollars - Left: US - Right: Europe). quantities for turnover are very different from one zone to another. Let’s note that there is a ratio of 3 between values for Japan (which is however a developed economy) when compared to the US.

US Europe Japan Canada Australia Turnover: Quantile = 0.25 4.8 0.4 1.3 0.9 0.05 Turnover: Median 18.3 2.8 4.8 4.2 0.6 Turnover: Quantile = 0.75 51.2 20.7 15.1 13.2 5.2 Turnover: Maximum 2250.0 2040.0 821.0 256.7 208.5

Table 8.4: Quantiles (rounded values) of daily turnover distribution by zone. In Millions of US Dollars. 2 years period 2017-2019.

8.5 Biases and links between variables

We explore in this Section the biases and links between the variables we have just described. Con- sequently, the notions of market capitalization, β, turnover and volatility that are compared here are computed as in the previous section. In particular we have for each graph (Figures 8.11, 8.12, 8.13) one point per stock. This point for each quantity and each stock, is obtained as the median value of the 2 year time seriess (500 points) made of the rolling one year average quantity for each stock, on the period 2017-2019.

We systematically investigate the bias between market capitalisation, and a quantity which is respec- tively turnover for Figure 8.11, volatility for Figure 8.12, β in Figure 8.13. We draw a scatter plot (with market capitalisation in log scale for the x-axis) and a regression line for each graph.

We see in Table 8.11 that turnover and market capitalisation covary and are tightly linked. Even if there is a difference between them as a proxy of size (Ciliberti et al. (2019) again), we see that they support the same kind of information and have a strong correlation around 90%. In Figure 8.12 we see also that the link between volatility and market capitalisation is weaker than with turnover, but is quite strong (absolute correlation around 60%). The biggest stocks are those that have

116 Figure 8.11: Regression of Turnover vs Market Capitalisation. Turnover and Market Capitalisation (log-scale) are in Millions. generally a lower volatility. Smaller stocks have, generally speaking, a higher volatility.

Finally, the link with β is more difficult to comment. Let’s recall that β is the sensitivity of the stocks’ returns towards the returns of an index that is made of the highest market capitalisation stocks (in addi- tion to other criteria). So stocks that participate in the index construction should exhibit a β value around one. The plot we get also depends strongly on the pool and the index we choose. Here we see that in the U,S the mean value of beta whatever the capitalisation is well centered around one. In Europe, we see conversely that the biggest stocks have a beta close to one, small stocks having a beta in average that is lower.

In conclusion, the biggest stocks have a low volatility, a high turnover, a beta close to one and the smallest stocks have a beta lower than one, a low turnover and a high volatility. This is a crude characterization yet describes quite well the landscape at first order.

8.6 Heuristics for stock characteristics

We want to conclude this chapter by giving some figures on classical values that are looked at by in- vestors as by equity analysts. Dividends and what we call multiples are major elements to analyse the behaviour and the health of companies.

8.6.1 Dividends Let’s start with dividends. Dividends are paid by companies to shareholders. Normally, they are linked to company valuation, since they are considered as flows: if company valuation is based on the actual value of future flows, the perspective of receiving dividends has a positive impact on the subjective view of investors. The decision for a company to pay a dividend is a corporate decision, made at the top level that is very linked to the business plan of the company. Some young companies may choose to pay dividends to attract shareholders, or rather to delay the first dividend payment, in order to reinvest gains in future projects. Mature companies are often believed to be dividend payers, even if it is not always

117 Figure 8.12: Regression of Volatility vs Market Capitalisation. Volatility is annualised and Market Capitalisation (log-scale) is in Millions.

Figure 8.13: Regression of β vs Market Capitalisation. Market Capitalisation (log-scale) is in Millions.

118 true: for years, Apple was considering that it was not important for them to pay dividends, since the management thought that the natural price appreciation of the stock should be enough to validate the job done by the company. It is however perceived as a negative sign that a company stops to pay dividend at a present moment. Consequently, when a company starts to pay dividends, they keep on paying for many years. Stopping to pay dividends is then perceived as extreme by markets’ participants.

Table 8.5 states the quantiles of dividends that are typically paid. The quantities given here are very in- dicative. For this, we give typical values obtained as median of the quantiles computed each year on the period from the 1st of January 2000 to the 1st of January 2019. The dividends paid are computed only on paying stocks, expressed as percent of the share price, and aggregated on a year. Those quantities are in fact very stable on this period, hardly affected by the global financial crisis of 2008. We can just state that the average dividend paid is increasing over 2010-2019 in Japan, which is not the case for other geographical zones. A stock paying dividends pays roughly 2.5% of the share price in a year. This is the main value to keep in mind.

The other interesting quantity is the histogram of number of payments a year for a stock. This is repre- sented in Table 8.14. We see that in the US, more than half of the stocks is not paying dividends. This proportion is lower in Europe, around 30%. We see that typically, in the US, stocks are paying dividends 4 times a year, so onece a quarter. In Europe, things are different since stocks that pay dividends do it only once a year. Table 8.15 gives a complementary information: the dividends’ seasons. This is the histogram of the proportion of dividends’ payments depending on the month where the payment has been announced. Clearly, in the US, there is no preferential period, and payments may occur at any moment in the year. In Europe, the typical yearly payment occurs in a period between April and May.

US Europe Japan Dividends: Quantile = 0.1 0.5 1 0.75 Dividends: Median 2.0 2.5 1.5 Dividends: Quantile = 0.9 5.5 5.5 3

Table 8.5: Quantile of total dividends paid per year. Unit: percent of share price.

8.6.2 Earnings and multiples An other very important set of number is what is called multiples. Those quantities are typically those that we read on Figure 8.1 under the names such as “PE” or “EPS”. Here again, our developments are very descriptive. We give typical values obtained as median of the quantities computed each year (and averaged over stocks) on the period from the 1st of January 2000 to the 1st of January 2019. We focus on 4 typical values: Price to Earnings, Price to Book Value, Price to Sales and EPS (Earning Per Share). The fact is that company publish on a regular basis (quarterly, semi-annually or annually) fundamental figures in their financial statements. Those statements are made of three parts: the Income Statement (how much the company has earned on the last period, say, quarter), the Cash-Flow Statement (how much cash in real currency has transited on the bank accounts of the company) and the Balance Sheet (which is the statement of all the assets, debts and liabilities of the company). Those fundamental statements summarize the financial health of the company and sums up in a way, how the business is run, and how value is created. In particular, Book Value is the difference between Total Assets (what the company owns) and Total Liabilities (what the company owes). If the company stops its business, this is the quantity of money that goes to shareholders once the debitors have been refunded. Then, Book Value is of the same order of magnitude than the market capitalisation. Book Value and financial statements

119 Figure 8.14: Proportion of number of dividends’ payment by year.

Figure 8.15: Dividends’ seasons: proportion of dividend payment by month.

120 are typically the proxies used to compute the Value or the Quality factors as explained in Chapter 5. Here are the definitions of the quantity we look at:

• Price to Earnings: Market Capitalisation divided by Net Income (annualized).

• Price to Book Value: Market Capitalisation divided by Book Value (annualized).

• Price to Sales: Market Capitalisation divided by Sales (annualized).

• EPS: Net Income (annualized) divided by Number of Shares Outstanding.

To go one step beyond and discover a bit more fundamental accounting and corporate finance, one may refer to Brealey et al. (2003).

US Europe Japan Price to Earnings 12.0 12.0 15 Price to Book 3 2.5 2.0 Price to Revenue 4 3.5 2.0 Earning Per Share 0.9 0.9 1.5

Table 8.6: Typical values of multiples by zone for the period 2000-2019.

121 122 Chapter 9

A Focus on Covariance Matrix

We spoke about statistical factors in Section 3.2.3 and detailed the use of them throughout Section 3.2.3.2. We will extend now the statistical analysis of the covariance matrix of stock returns in the present chapter. The fact is that this statistical exercise is at the conjunction of two streams of research. It appears as the conjunction of a factorial intepretation (through a PCA framework) but also of portfolio allocation, as we explain it briefly in Section 9.1.

9.1 One Word On The Stock Covariance Matrix

9.1.1 Mean-variance allocation Let’s have a word on mean-variance allocation: this will help to understand the duality of our problem. The mean-variance optimization problem can be derived from the expectation utility writing as devel- oped e.g in Chapter 3 of Ielpo et al. (2017). Starting with an universe of assets, an investor wants to select the optimal weights of each asset, while trying to control its expected performance and its expected level of risk. Mean-variance allocation aims at building such a portfolio by solving a one-period optimization program, through the maximization of the expected return of the portfolio for a given level of risk. This level of risk depends on the targeted budget of risk that the investor is ready to accept. It requires only two objects: an estimation of the expected returns of the assets, and an estimation of the risk between assets. When returns are assumed to be i.i.d. in time and follow a Gaussian distribution with moments µ and Ω (the allocation model being decently robust to this assumption), the optimal weights of the portfolio w‡ are (λ being the Lagrange multiplier): 1 w‡ Ω1µ. λ where λ is determined through the constraint on the variance.

9.1.2 Weights instability Practically, having at hand estimators µˆ and Ωˆ 1, multiplying them in the latter equation leads to unsta- ble solutions or extreme, over-weighted vectors. Merton (1980) already remarked the sensitivity of the resulting empirical portfolio to slight modifications of expected returns. Jobson and Korkie (1980), Job- son and Korkie (1981b), Jorion (1986), Frost and Savarino (1986), Frost and Savarino (1988), Michaud (1989), Best and Grauer (1991), Chopra and Ziemba (1993) among others, highlighted such problems when inverting the sample covariance matrix, since variations or errors in the return vector µˆ are dra- matically amplified. Coordinates with abnormal weights are unfortunately (or logically?) those corre- sponding to the greatest estimation error. Michaud (1989) even calls the mean-variance optimization the “error maximization” and recalls that we have to make a clear distinction between the “financial

123 optimality” and the “mathematical optimization”.

This fact is a real limitation for practitioners. Proceeding to a rolling estimation of Ωˆ and µˆ at two con- secutive dates t and t  1, we end up with estimates µˆt and µˆt1 that are very close, then, even if Ωˆ t and ˆ ˆ ‡ ‡ Ωt1 are also quite close (and Ω may be also fixed), wt and wt1 may be far different due to the nearly singular nature of Ωˆ 1. The same feature remains with any estimator of µ varying (even slightly) in time, coming from an historical estimation or not. This is a drawback for the investor since it implies a huge turnover of its portfolio, a deep rebalancing of its positions, and a loss of interpretation in the views she or he tries to implement. This is an unwanted feature that generates practical limitations, turnover and high transaction costs. Most of all, this gives a bad understanding of what is going on and the potential profitability of the set-up is wrecked by transaction costs. The instability problem does not come from the return estimator in itself but from the spectral configuration of Ωˆ. When N is less than T but still very large, the sample covariance matrix is invertible but numerically ill-conditioned. Ωˆ may be close to singular with eigenvalues close to zero. When N is larger than T , Ωˆ is not even invertible. This occurs especially when T is of the same order as N (T greater than N for identifiability but with T ~N close to one). This fact has been well identified in the literature. Seminal references may be found in the work of Muirhead (1987) or Pafka and Kondor (2004). The asymptotic theory with N,T ª is developed in Ledoit and Wolf (2002).

It has been known for a long time (see e.g. Black (1976)) that the variance of returns is in general not constant, whatever the asset class. The estimation do vary but quite slowly in time: it would be sufficient to refresh the estimator frequently. Usually, it may be improved by estimating separately correlations and volatilities but it’s comparatively a minor problem when compared to the problem of the estimation of the expected returns, µ. Estimating µ is a difficult task as underlined in a famous contribution of Merton (1980). The idea is that if we fix two points in time, t0 and t1, and observe a process S sampled at t0 and t1, the estimation of the drift we obtain would be dramatically impacted by the potential fluctuations of the stock between those two points. If the intrinsic volatility is huge, then the noise on the estimation of the drift is high and the estimator is of poor quality. Therefore if we can not sub-sample S between t0 and t1, we have no guess on the level of the volatility, no guess on the potential heteroskedasticity, etc. In other words, knowing where we start and where we end, does not tell us much on the regularity of the path.

Take-Home Message

Studying the PCA of the covariance matrix of the assets is useful for describing the factors that drive the risk of the assets, yet it is also important to understand how instability may appear and control for it within a framework of portfolio allocation.

9.1.3 Should we work with covariance or correlation? Should we estimate the correlation or the covariance matrix directly? The covariance matrix Ω of N asset returns writes: Ω DCD, where D diagˆσ1, ..., σN  is the diagonal matrix of size N made of the individual volatilities of the assets on the diagonal (0 elsewhere), and C is the N  N correlation matrix between the assets. Either estimating Ω directly or not, we should end with similar results. Empirically, things are different. The answer depends on experience, preferences and habits of practitioners. Academic and empirical refer- ences in finance are scarce since depending on the data, practitioners will rather trust their experience

124 and their knowledge of the dataset. The answer that comes more frequently is that it is preferable to separate the estimation of the volatility and the estimation of the correlation. A practical argument is that it allows first to build an expertise on each item to be estimated but also that the spectral analysis of the correlation matrix is numerically more stable and less tricky.

Correlation estimation is not so difficult since the sample counterpart estimation is clear and easy to implement. The resulting matrix has properties that are identified (symmetry, diagonal terms equal to 1). The trace of the matrix is known and equal to N. The estimation of the volatility/variance may be more complicated but splitting the problem allows any kind of sophistication. ARCH/GARCH models which are now quite common (Engle (1982)); exponential moving averages, mixed frequencies, etc. Potters et al. (2005) adapts the usual covariance estimator using exponential moving averages of returns, and also deals with the mixed-frequency approach in the most general way. Working on the correlation matrix is a way to “normalize” the data. In the case where stocks have volatilities of different scales, working on the spectral decomposition of the correlation matrix is safer and allows to focus on the study of the real structure of linear dependence.

We will study the spectral analysis of correlation matrices in the following. The eigenvalues of the correlation and the covariance matrix should only differ by the multiplication of the individual variances of the assets. All the qualitative conclusions drawn from the spectrum of the correlation matrices should hold for the spectrum of the covariance matrix. It is important to state that working on correlation does not prevent us to solve for the stationarity problem. Variance, asset by asset, can vary a lot, but so does correlation. Whatever the object of interest, the non-stationarity of the estimated object is a main problem that will lead to estimation error, with consequences in out-of-sample.

9.1.4 PCA and SVD When studying empirical correlation matrices, one may find alternating discussions on the use of “PCA” (Principal Component Analysis) or on the use of “SVD” (Singular Value Decomposition). What are they, is there a difference, is one more preferable than the other? This duality of notions may be con- fusing but fortunately they collapse in one and only in the present case as we deal with symmetric, semi-definite positive matrices. In finite-dimension, in the vectorial1 spaces RN where belong the vec- tors of weights w on the N assets, matrices are the natural incarnation of linear operators. If R is the T  N matrix of the return series of the N assets of length T , and if K ¢ RN RT is the operator T œ that maps a N-dimensional real-valued vector w onto Rw, then the covariance matrix Ω N R R will represent (up to the normalizing factor) the RN RN operator KK‡. This implies in particular that the eigenvalues of Ω coincide with the squared singular values of operator K. Said alternatively, in the linear, finite-dimensional case, diagonalizing Ω or proceeding to the SVD on the initial return observations is identical up to a square-root transformation of the eigenvalues. We will either use one term or the other in the following.

There is a confusion to lift off when comparing SVD and PCA. PCA (see Hardle¨ and Simar (2015) for an in-depth introduction) is a well known learning technique aiming at finding linear combinations of the initial variables (vectors or axes) that allow to explain data at best in the sense of variance decomposition. The technique relies also on the diagonalization of the covariance matrix of the initial variables. The eigenvectors are homogeneous to axes of representation, and the eigenvalues are homogeneous to the proportion of variance represented by the vector. Should we keep only the most important combinations of data, we would retain only the axes where corresponding eigenvalues that are the highest. PCA is adapted to “any” kind of variable, that is to say not only to financial returns, and is widely used in data

1We recall that RN with its natural scalar product in an Hilbert space.

125 analysis. However, in finite dimension, SVD is also understood as the decomposition of the previous observation matrix R as: R U∆V œ, where U is a T D matrix, ∆ is DD diagonal with descending diagonal elements, V is an orthonormal D  N matrix, and D is the rank of matrix R. In particular , an SVD is possible on non-squared matrix and this decomposition is unique. D will be equal to the number of non-zero eigenvalues of RœR V ∆2V œ. In the case of covariance matrix estimated on stock returns, we would hope that D will be close to N even if this equality will be difficult to check due to estimation error.

Technically, using a PCA would be equivalent to diagonalize RœR directly but it is in practice nu- merically more efficient (or less problematic) to estimate the eigenvalues by estimating first the SVD decomposition through U, ∆ and V and recover the eigenvalues as the squared values of the diagonal elements of ∆. The robustness of the estimation method for the empirical correlation matrix is indeed necessary numerically to compute directly the eigen decomposition of the correlation matrix but in this case, the sensitivity to problems in data (missing values, non-synchronized series, etc.) may be an additional source of instability (some negative empirical eigen values for instance).

9.2 Eigen decomposition and financial interpretation

9.2.1 Notations We decompose C, the correlation matrix, through its eigen-decomposition. C is related to Ω through:

Ω diagˆσ1, ..., σN Cdiagˆσ1, ..., σN , and C is a N  N symmetric, positive semi-definite matrix with 1 along its diagonal. We work with estimators: C, Ω, eigenvalues and vectors should be understood as estimated values. The estimator notations will then be skipped in this present section for sake of clarity unless precised. The eigen- decomposition of C is made through:

œ C Φ DΦ. (9.1) Both Φ and D are N  N symmetric, squared, matrices. D is the diagonal matrix:

D diagˆλ1, . . . , λN , made of eigenvalues on the diagonal, 0 elsewhere. We will assume that the eigenvalues are sorted in descending order, that is λ1 C λ2 C ... C λN C 0, and this sorting will also drive the indexation of associated eigenvectors. Φ is the matrix of eigenvectors Φ ˆφ1, . . . , φN  with the additional assumption that this set of eigenvectors represent an orthonormal N 2 1 œ basis of R , which means that for all ˆi, j > 1; N¥ , φi.φj δij. In particular, we have Φ Φ and:

1 œ Ωˆ Φ diagˆ1~λ1,..., 1~λN Φ.

9.2.2 Eigenvalues 9.2.2.1 Interpretation Eigenvalues form an ordered vector of size N, with positive elements. Eigenvalues represent the specific related variances and then the risk part that each eigenvector may explain. Eigenvalues of a covariance

126 matrix are homogeneous to risk but are in the same time a proxy of information. The higher the eigen- value, the higher the importance of the related eigenvector to explain the asset returns. The highest eigenvalues, with their associated eigenvectors, therefore represent the main directions for investment. As we will introduce it just below, if one uses an orthonormal basis to diagonalize the correlation matrix, eigenvectors will be orthogonal with each other. This means that any portfolio orthogonal to all of the F first eigenvectors (with F a fixed integer such that N A F A 0) will evolve in the subspace spanned by eigenvectors from order F  1 to N and that the risk it will bare will be bounded from above by λF 1.

9.2.2.2 Random Matrix Theory Eigenvalues of random matrices are the object of a vast literature. The spectrum of sample correlation matrices is well understood, things are less clear for eigenvectors. But correlation matrices are esti- mated. So if the true matrix was constant in time (which is is probably wrong) we will have, in order to estimate them, to use finite-time series. The previous statements remain obviously correct, but the esti- mated eigenvalues will essentially be noisy estimators as the correlation matrix to be estimated remains a stochastic object.

Let us assume that r T ~N is fixed, and greater than 1. When we assume that the coefficients of the matrix of assets price changes are independent, identically distributed, the distribution of the eigenvalues of the resulting correlation matrix is known when N,T ª and r stays fixed. This density takes sense only in an asymptotic framework and is known thanks to the work of Marcenkoˇ and Pastur (1967). This may become a guideline to evaluate the available information carried by a sample correlation matrix and a rather large number of studies have already been made on financial time series.

The study of random matrices, Random Matrix Theory (henceforth RMT), helps in this context to sep- arate noise from real information. A detailed description is unfortunately far beyond our scope and we will be more interested here on the use of RMT in a financial context. Seminal contributions may be found in Marcenkoˇ and Pastur (1967), in Mehta (2004) for a complete review, or to Laloux et al. (1999), Malevergne and Sornette (2004), Potters et al. (2005), Bouchaud et al. (2007), Bouchaud and Potters (2011). RMT provides insights on the spectrum2 of the sample correlation matrix. Comparing eigen- values of sample variance matrices of asset returns is generally quite similar to the spectrum of i.i.d. random variables, apart from the largest eigenvalues. The question becomes whether the information coming from this matrix, and channelled through its eigenvalues, can be considered significantly differ- ent from pure noise. In case of pure randomness (even if this term should be handled with care!) the empirical density of the so-called Marcenko-Pasturˇ distribution fˆλ of the eigenvalues of the empirical matrix should be equal to: » r ˆλ  덈λ  λ  fˆλ max min , 2πσ2 λ where σ2 is the variance of the initial observations (returns, here) that led to the estimation of the matrix 2 (for correlation or normalized returns, we should have σ 1). λmax and λmin depend explicitly on the noise ratio r T ~N. We have: ¾ ¾ 1 1 1 1 λ σ2Š1   2  and λ σ2Š1   2 . min r r max r r fˆ. has an inverted-V shape with finite support: the density is equal to zero outside the segment λmin; λmax¥. There is no eigenvalue to be expected in the 0; λmin¥ segment and the case where r

2It provides also in fact insights for the expected distribution of eigenvectors coordinates in case of true randomness: it is modelled by the Porter-Thomas distribution as underlined in Laloux et al. (1999).

127 is close to one is specific in the sense that λmin becomes» close to 0 and the mode of the distribution diverges. The width of the support is therefore driven by 1~r. Using this distribution in a financial context allows to guess which eigenvalues may be potential candidates for being representative of in- formative directions. Each eigenvalue below λmax, i.e. inside the bulk of the MP distribution, may be considered as negligible and equivalent to noise.

9.2.3 Eigenvectors 9.2.3.1 Interpretation

Eigenvectors are homogeneous to portfolios. The matrix Φ, whose columns are the eigenvectors φi, is a basis of orthonormal portfolios resuming best the linear information available through the obser- vation of asset returns. In financial terms, mixing those portfolios with expected returns is a way to find linear combinations of those orthonormal portfolios, that become the building-blocks of the final arbitrage portfolios. Their returns are in particular uncorrelated by construction. The first eigenvectors (corresponding to the highest eigenvalues) explain best the information, in the sense that they have the greatest contribution to the overall variance. Hence, the last eigenvectors (i.e. φi with i AA 1) are the less informative. In the degenerate case, an eigenvector with a close-to-zero eigenvalue means that Cφk ¨ 0, which means, in financial terms, that portfolio k has a variance that is close to zero and may appear as risk-free. Then, if the expected return is positive, even slightly, the allocation process understands it as an arbitrage opportunity and overweights the allocation in this portfolio, artificially considered by the algorithm as a “good opportunity”: a positive profit with nearly no risk. This means that φk is co- linear to the other vectors and then it can be expressed as a linear combination of other eigenvectors. Thus, eigenvectors corresponding to the smallest eigenvalues are portfolios that are redundant in term of information. Technically, an eigenvector is used to describe, represent and qualify one direction in a N-dimensional space. The sub-space that is spanned by a set of orthonormal vectors is identical to the one spanned by vectors where all signs of each coordinate are flipped.

Evaluating the noise of estimated eigenvectors on the whole spectrum is a much more sophisticated challenge. Up to our knowledge, there is no complete overview or exhaustive framework able to model the entire set of eigenvectors of any given empirical matrix. Some enlightening results do exist however, but they are rather sparse in the sense that the amount of literature on the topic is far more inferior to the equivalent amount of literature on empirical eigenvalues. Eigenvectors are consequently represented or studied in the order implied by eigenvalues. They may convey some information, but a distinction has to be made depending... on the eigenvalue they represent! Whereas the eigenvectors corresponding to the biggest eigenvalues are rather stable (even if they may change through time), eigenvectors corresponding to the smallest eigenvectors are more noisy, more unstable and quite impossible to interpret.

For equities, the first top eigenvector coincides with the market mode and is a bit easier to study. Study- ing the variogram of the first eigenvector, Bouchaud and Potters (2011) shows that the natural time of fluctuation of this mode is around 100 days, which is quite long but sufficient to understand that this mode is however changing and evolving. Allez and Bouchaud (2012) identifies and studies the dy- namics of eigenvectors corresponding to the largest eigenvalues. Generally, eigenvectors electing for an interpretation are those associated to the highest eigenvalues and have an interpretation in terms of sectorial, geographical, or macroeconomic bets. For instance, if one observes an eigenvector whose positive coordinates are associated to stocks from the financial sector whereas negative coordinates are all the other stocks: this would indicate a sectorial bet buying financial stocks, shorting all other stocks. However such patterns evolve in time, change, decrease, switch in order, and are relatively hard to track down since it requires a qualitative reverse-engineering of the vectors. One main contribution that as- sesses precisely the economic and sector interpretation of the first eigenvectors is the work of Plerou

128 et al. (2002), but see also Uechi et al. (2015). References stating that sample eigenvectors corresponding to the eigenvectors of the smallest eigenvalues may be the support of a limited information have to be found e.g. in Bai and Silverstein (2010), Pastur and Shcherbina (2011) or Monasson and Villamaina (2015). Another result that illustrates the specificity of eigenvectors is that contrary to the case of eigen- values, making N growing to infinity does not allow to gain some stability. As N grows, the noise in eigenvectors in general does not decrease and fluctuations still appear. No limit is to be identified and in large dimension N has no stabilizing effect on the estimated vectors.

9.2.3.2 Sample eigenvectors What is the suitable tool3 to study sample eigenvectors? The main quantity appears to be the overlap which is, for two N-dimensional vectors, the scalar product in RN between them. In fact, as eigenvec- tors are identical up to their sign, the real quantity of interest is the squared overlap. The squared overlap lies between 0 and 1 and describes the similarity of the vectors between them. The overlap will be useful to answer two questions. The first question is the effective similarity between an eigenvector of order i > 1; N¥ coming from a sample matrix and the true vector of the same order i. The second question is the potential similarity between two eigenvectors of order i coming from two matrices estimated in two different contexts (for instance, two sample matrices estimating the same unknown matrix Ω but at two different times). A useful prism to answer for the first question, is that eigenvectors have to be studied correspondingly to the fact that the eigenvalue they are associated with, is, or is not, in the bulk of the eigenvalues distribution. If the corresponding eigenvalue is in the bulk of the distribution, few things may be said since the overlap between the sample vector and the corresponding true one is of order 1~N. Intuitively, we could say that for the eigenvalues in the bulk of the distribution, there is so much noise that the notion of order in the spectrum is blurred and that there is few chances to get a precise mapping between sample vectors and the true ones.

However, as soon as we study eigenvectors for eigenvalues standing out of the bulk (isolated eigenvalues that convey information) the overlap is of order 1 meaning that in this case, a clear link is maintained between the true vector and its estimation, stable and well identified. See for this Benaych-Georges and Nadakuditi (2011) as a reference. References defining and using the overlap quantity include also for instance Allez and Bouchaud (2014), Allez et al. (2015), and Bun et al. (2016) to which we refer for an extensive reading. In particular, recalling that the overlap between the first eigenvector and the other ones is always very weak, Monasson and Villamaina (2015) finds that it is still possible to exploit this accumulation of weak “cross-information” (materialized by the overlaps) between the first eigenvector4 and the many other eigenvectors of a same matrix, to recover some additional information on the first one and improve prediction.

9.3 Heuristics

Numerical Example 9.1 We plot in Figure 9.1 the correlation matrix of the daily returns of the S&P500 components. We fix the composition of the S&P500 at the date5 of 2015/01/01 on 13 years,

3I would like to thank Joel¨ Bun for his help on the topic of eigenvector interpretation. 4Eigenvectors on a given matrix are built to be orthonormal, so scalar product between two different eigenvectors should be equal to zero. However, in the precise context of Monasson and Villamaina (2015), this scalar product has to be understood as the scalar product between eigenvectors of order A 2 on the estimated matrix and a first, “true” eigenvector where we have a prior (a theoretical one). 5The example taken in this chapter is essentially useful for a statistical, descriptive study. We do not follow dynamically the S&P500 index on the period with stocks potentially leaving and entering the index, as we are mainly interested by the underlying statistical aspects of the study of a pool of stocks. A real backtest on such a pool of fixed stock would lead to

129 i.e. returns for all available prices between 2002/01/01 and 2015/01/01. We have in this case T 3273 and N 502 (502 stocks, 3273 daily returns). This makes a ratio T ~N ¢ 6.52. Returns are normal- ized by their unconditional standard deviation on the whole sample. Correlation is computed as the Pearson correlation. We observe that correlations are essentially positive (only 14 negative terms over NˆN  1~2 125751 possible combinations of couples of assets). The mean correlation over assets (measured as the flat average of the off-diagonal terms) is about 0.3466.

This general positive correlation accounts for the existence of a general “market mode” (all stocks generally co-moving with the market) reflecting the index nature of the S&P500, and the underlying motivations and message of the CAPM. However the general correlation is expected to increase in periods of market turmoils. In case of directional move of the market, stocks co-movements are higher in case of bear market than in case of bull markets. Stocks are relatively more correlated in bad times. This constitutes a well known stylized fact supported for instance by Ang and Chen (2002), Ang and Bekaert (2002), or Baele (2005) among others.

We illustrate this empirical evidence by plotting on Figure 9.2 the box-plot distribution of the correla- tion of S&P500 stocks with the relative index, per calendar year (the sample for the index as for each stock is therefore their returns within one calendar year). In this figure, this is not the correlation of stocks among others, but of stocks with the index, but this should convey roughly the same nature of information. We observe a neat spike in the correlations’ distribution during the economic crisis lasting approximately from 2007 to 2009, assessing the observation that, in bear markets, stocks tend to corre- late more with the market.

Figure 9.3 also illustrates this fact as we plot stock by stock the correlation with the index (x-axis) obtained on the overall period vs. the correlations with the index computed (y-axis) on the restrained crisis period (years 2007, 2008 and 2009). The plain line represents the identity line. We observe that the vast majority of correlations are shifted upwards of roughly 10-15%. This increase of the correlations seems to be a general pattern.

Numerical Example 9.2 Figure 9.4 shows the eigenvalue distribution of correlation S matrix previ- ously displayed in Figure 9.1. The x-axis is in log10 scale for ease of visualization. The whole sum of the values of the spectrum is obviously equal to 502, the number of assets, because of trace preservation. Every eigenvalue is positive here. We clearly see that apart from 10 or 20 values that are way greater than any others, the remaining (480 or so) values are very close and belonging to a main bulk of values. The amplitude of the values in the spectrum is wide since the smallest eigenvalue is equal to 0.06 and the highest one is equal to 183. The highest eigenvalue is related to the “first mode” of the matrix which is commonly called the market mode (see below). The second highest eigenvalue is only equal to 15.2 which is far less than the highest one, this latter explaining alone nearly half of the total “normalized risk”.

Numerical Example 9.3 Following the previous analysis, we can estimate that as N 502, T 3273, 2 MP and as we are working with a correlation matrix, σ 1 which gives a value of λmax 1.94. This value MP is represented on Figure 9.4. With this value of λmax one can estimate that only 17 eigenvalues are greater than this value, which means that only approximately 17 eigenvectors may support a real infor- mation, with nearly 96.6% of eigenvalues in the bulk of the distribution. The sum of the corresponding survivorship bias but again, it is not our point here. 6Note that this value is on daily returns. Potters et al. (2005) finds a value around 0.29 on S&P500 on a different period. However, the figures are lowered for intra-day data since it is more around 0.2 for 1-hour returns and around 0.05 for 5-minute returns (see Potters et al. (2005) again).

130 Figure 9.1: Daily correlation matrix of S&P500 constituents at the date of 2015/01/01 for a period from 2002/01/01 to 2014/12/31. Data represents 502 assets on 3273 returns. Correlation is computed as Pearson correlation. Mean correlation is equal to 0.346. eigenvalues is roughly equal to 267 which means that only 3% of the eigenvectors account for nearly 53% of the total volatility.

Numerical Example 9.4 We compare in Figure 9.5 the first and the second eigenvectors of the cor- relation matrix displayed on Figure 9.1. We have ranked the stocks according to the algebraic value of the coordinate on the second eigenvector for sake of illustration. Consequently, each point on the x-axis represents a stock with its corresponding coordinates on the first and the second eigenvector. We observe in particular that:

Y the first eigenvector (represented by points) associated to the highest eigenvalue (equal to 183) has coordinates that are all positive, with a mean equal to 1~502 ¢ 0.002 (thin dashed line); Y the second eigenvector (represented by a dashed line) associated to the second highest eigenvalue (equal to 15) has coordinates that are both positive and negative, with a zero mean.

The first eigenvector is often called the market mode since up to minor variations, all the coordinates of the vector are nearly equal. This means that the return of this eigenvector seen as a portfolio, is nearly the mean of the return of the set of stocks, then approximately the return of the market. This is even stronger in the present case, since, when dealing with components of the S&P500 index, we nearly obtain the returns of the index itself, potentially accounting for “the market”. We recover a CAPM interpretation for this eigenvector which represents major common market moves. Such an eigenvector will therefore generally always appear, whatever the period or whatever the set of estimation.

131 Figure 9.2: Boxplot distribution of the correlation of S&P500 stocks constituents with the S&P500 index, per calendar year (one value per stock and per year). Total sample of returns still comprised between 2002/01/01 and 2014/12/31 for 502 fixed assets.

The second eigenvector has features that are easy to explain but are more difficult to interpret. The fact that we obtain positive and negative values is straightforward. As the basis of eigenvectors has to be orthogonal, if the first eigenvector is close to a unitary vector, all of the other N 1 eigenvectors have to be orthogonal to it, that is having a zero scalar product with the first eigenvectors, so (approximately) a zero sum of the coordinates. So whatever the order of the eigenvector we could have represented along with the first eigenvector, we would observe such a pattern (positive and negative coordinates with zero sum). The fact that the median coordinate is around zero (as many positive as negative coordinates) may be commonly observed but is not compulsory.

Remark 9.1 With a given covariance matrix, changing the expected return vector may lead to insta- bility. As time passes by however, one may be concerned by the refresh of the covariance matrix. A possibility could be to refresh only the volatilities, to estimate a new correlation matrix and to test if the correlation structure is similar or not to the previous one, to know whether they are different or not. For tests on the equality of two correlation matrices, see the test procedures described in Jennrich (1970) or Larntz and Perlman (1988).

132 Figure 9.3: Scatter-plot of stock by stock correlation with the S&P500 index obtained on the overall period 2002/01/01 and 2014/12/31 (x-axis) vs. same correlations with the index computed on calendar years 2007, 2008 and 2009 (y-axis) representing crisis period. Plain line is the identity line. Total sample of returns still comprised between for 502 fixed assets.

Main Bibliography

• Marcenkoˇ and Pastur (1967)

• Merton (1980)

• Jorion (1986)

• Muirhead (1987)

• Michaud (1989)

• Mehta (2004)

• Pafka and Kondor (2004)

• Ledoit and Wolf (2002)

133 Figure 9.4: Histogram of the eigenvalues distribution of the same correlation matrix displayed on Figure 9.1. Eigenvalues are displayed as raw values but represented with a log10 scale. The eigenvector corre- sponding to the highest eigenvalue is to be understood as the “market” mode. The vertical dashed line indicates the value of the λmax edge in the sense of the Marcenko-Pasturˇ analysis as detailed in Example 9.3 for σ2 1.

134 Figure 9.5: Comparison of the first and second eigenvectors (both normed with a sum of absolute terms equal to one) of the same correlation matrix displayed on Figure 9.1. The x-axis is an arbitrary label of the stock after sorting the coordinates of the second eigenvector in ascending order. We plot on the y-axis the normalized value of the coordinates of the first and second eigenvector.

135 136 Bibliography

Alessi, L. and Capasso, M. B. M. (2010). Improved Penalization for Determining the Number of Factors in Approximate Factor Models. Statistics and Probability Letters, 80(23-24):1806–1813.

Alexander, C. (2008). Moving Average Models for Volatility and Correlation, and Covariance Matrices, chapter 62. John Wiley & Sons.

Allez, R. and Bouchaud, J.-P. (2012). Eigenvector dynamics: General theory and some applications. Physical Review E, 86:046202.

Allez, R. and Bouchaud, J.-P. (2014). Eigenvector dynamics under free addition. Random Matrices: Theory and Applications, 3(3).

Allez, R., Bun, J., and Bouchaud, J.-P. (2015). The eigenvectors of Gaussian matrices with an external source. https://arxiv.org/abs/1412.7108.

Almgren, R., Thum, C., Hauptmann, E., and Li, H. (2005). Direct estimation of equity market impact. Risk, 18(7):58–62.

Amihud, Y. and Mendelson, H. (1986). Asset pricing and the bid-ask spread. Journal of Financial Economics, 17(2):223–249.

Amin, G. and Kat, H. (2003). Welcome to the Dark Side: Hedge Fund Attrition and Survivorship Bias over the Period 1994-2001. Journal of Alternative Investments, 6(1):57–73.

Andersen, T., Bollerslev, T., Diebold, F., and Labys, P. (2003). Modeling and forecasting realized volatility. Econometrica, 71(2):579–625.

Andersen, T., Dobrev, D., and Schaumburg, E. (2012). Jump-robust volatility estimation using nearest neighbor truncation. Journal of Econometrics, 169(1):75–93.

Anderson, T. (2003). An Introduction to Multivariate Statistical Analysis. Probability and Statistics. Wiley, third edition.

Ang, A. (2014). Asset Management: A Systematic Approach to Factor Investing. Financial Management Association Survey and Synthesis. Oxford University Press.

Ang, A. and Bekaert, G. (2002). International asset allocation with regime shifts. Review of Financial Studies, 15(4):1137–1187.

Ang, A. and Chen, J. (2002). Asymmetric Correlations of Equity Portfolios. Journal of Financial Economics, 63(3):443–494.

Ang, A., Goetzmann, W., and Schaefer, S. (2009a). Evaluation of active management of the Norwegian Government Pension Fund - Global. Technical report, Report to the Norwegian Ministry of Finance.

137 Ang, A., Hodrick, R., Xing, Y., and Zhang, X. (2006). The Cross-Section of Volatility and Expected Returns. The Journal of Finance, 61(1):259–299.

Ang, A., Hodrick, R., Xing, Y., and Zhang, X. (2009b). High Idiosyncratic Volatility and Low Returns: International and Further US Evidence. Journal of Financial Economics, 91:1–23.

AQR (2014). Our model goes to six and saves value from redundancy along the way. Working paper, AQR.

Arnold, B. and Groeneveld, R. A. (1995). Measuring Skewness with Respect to the Mode. American Statistician, 49(1):34–38.

Arnott, R., Beck, N., and Kalesnik, V. (2016). How Can Smart Beta Go Horribly Wrong?

Arnott, R., Harvey, C., Kalesnik, V., and Linnainmaa, J. (2019). Alice’s Adventures in Factorland: Three Blunders That Plague Factor Investing. https://papers.ssrn.com/sol3/papers.cfm? abstract_id=3331680.

Arnott, R., Hsu, J., and Moore, P. (2005). Fundamental Indexation. Financial Analysts Journal, 61(2):83–99.

Arnott, R. D. and Bernstein, P. L. (2002). What risk premium is “normal”? Financial Analysts Journal, 58(2):64–85.

Asness, C. (1997). The Interaction of Value and Momentum Strategies. Financial Analysts Journal, 53(2):29–36.

Asness, C. (2015). How can a strategy everyone knows about still work?

Asness, C., Frazzini, A., Gormsen, N., and Pedersen, L. (2019a). Betting against correlation: Testing theories of the low-risk effect. Journal of Financial Economics.

Asness, C., Frazzini, A., Israel, R., Moskowitz, T., and Pedersen, L. (2015). Size matters, if you control your junk. Working paper, AQR.

Asness, C., Moskowitz, T., and Pedersen, L. (2013). Value and momentum everywhere. The Journal of Finance, 68(13):929–985.

Asness, C. S., Frazzini, A., and Pedersen, L. H. (2019b). Quality minus junk. Review of Accounting Studies, 24(1):34–112.

Baele, L. (2005). Volatility spillover effects in european equity markets. Journal of Financial and Quantitative Analysis, 40(2):373–401.

Bai, J. (2003). Inferential theory for factor models of large dimensions. Econometrica, 71(1):135–171.

Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Economet- rica, 70(1):191–221.

Bai, Z. and Silverstein, J. (2010). Spectral Analysis of Large Dimensional Random Matrices. Statistics. Springer-Verlag, New-York, second edition.

Bailey, D., Borwein, J., Lopez de Prado, M., and Zhu, Q. (2014). Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance. Notices of the American Mathematical Society, 61(5):458–471.

138 Bailey, D., Ger, S., Lopez de Prado, M., Sim, A., and Wu, K. (2015). Statistical Overfitting and Backtest Performance, chapter 20. ISTE Press - Elsevier.

Bailey, D. and Lopez de Prado, M. (2012). The sharpe ratio efficient frontier. Journal of Risk, 15(2):191– 221.

Baker, M., Bradley, B., and Wurgler, J. (2011). Benchmarks as limits to arbitrage: understanding the low-volatility anomaly. Financial Analysts Journal, 67(1):40–54.

Balchunas, E. (2016). How the U.S. government inadvertently launched a $3 trillion industry. http: //www.bloomberg.com/features/2016-etf-files/.

Bali, T. and Cakici, N. (2008). Idiosyncratic volatility and the cross section of expected returns. Journal of Financial and Quantitative Analysis, 43(1):29–58.

Ball, C. and Torous, W. (1984). The Maximum Likelihood Estimation of Security Price Volatility: Theory, Evidence, and Application to Option Pricing. Journal of Business, 57(1):97–112.

Banz, R. W. (1981). The relationship between return and market value of common stocks. Journal of Financial Economics, 9(1):3–18.

Barber, B. and Odean, T. (2008). All That Glitters: The Effect of Attention and News on the Buying Behavior of Individual and Institutional Investors. Review of Financial Studies, 21(2):785–818.

Barberis, N. and Huang, M. (2008). Stocks as Lotteries: The Implications of Probability Weighting for Security Prices. American Economic Review, 98(5):2066–2100.

Barberis, N. and Thaler, R. (2003). A survey of behavioral finance. In Constantinides, G., Harris, M., and Stulz, R. M., editors, Handbook of the Economics of Finance, volume 1, Part 2, chapter 18, pages 1053–1128. Elsevier, 1 edition.

Barndorff-Nielsen, O. and Shephard, N. (2002). Econometric Analysis of Realized Volatility and Its Use in Estimating Stochastic Volatility Models. Econometrica, 72(3):885–925.

Barroso, P. and Santa-Clara, P. (2015). Momentum has its moments. Journal of Financial Economics, 116:111–120.

Basu, S. (1977). Investment performance of common stocks in relation to their price-earnings ratios: A test of the efficient market hypothesis. The Journal of Finance, 32(3):663–82.

Basu, S. (1983). The relationship between earnings’ yield, market value and return for nyse common stocks: Further evidence. Journal of Financial Economics, 12(1):129–156.

Bates, D. (1996). Jumps and stochastic volatility: Exchange rate processes implicit in deutsche mark options. Review of Financial Studies, 9(1):69–107.

Beck, N., Hsu, J., Kalesnik, V., and Kostka, H. (2016). Will your fctor deliver? an examination of factor robustness and implementation costs. Financial Analysts Journal, 72(5):58–82.

Benaych-Georges, F. and Nadakuditi, R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494–521.

Bender, J., Brett Hammond, P., and Mok, W. (2014). Can Alpha Be Captured by Risk Premia? The Journal of Portfolio Management, 40(2):18–29.

139 Berk, J., Green, R., and Naik, V. (1999). Optimal investment, growth options, and security returns. The Journal of Finance, 54(5):1553–1607.

Berk, J. B. (1995). A critique of size-related anomalies. Review of Financial Studies, 8(2):275–286.

Best, M. and Grauer, R. (1991). On the Sensitivity of Mean-Variance-Efficient Portfolios to Changes in Asset Means : Some Analytical and Computational Results. The Review of Financial Studies, 4(2):315–342.

Black, F. (1976). Studies of stock price volatility changes. In Proceedings of the 1976 Meetings of the Business and Economics Statistics Section, American Statistical Association, pages 177–181.

Black, F. (1993). Beta and Return. The Journal of Portfolio Management, 20(1):8–18.

Black, F., Jensen, M., and Scholes, M. (1972). The Capital Asset Pricing Model: Some Empirical Findings, pages 79–124. Praeger Publishers, New York.

Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of political economy, 81(3):637–654.

Blin, O., Ielpo, F., Lee, J., and Teiletche, J. (2017a). Macro risk-based investing to alternative risk premia. Technical report, Unigestion.

Blin, O., Lee, J., and Teiletche, J. (2017b). Alternative risk premia investing: from theory to practice. Technical report, Unigestion.

Blitz, D. and Van Vliet, P. (2007). The Volatility Effect: Lower Risk without Lower Return. The Journal of Portfolio Management, 34(1):102–113.

Blitz, D. and Van Vliet, P. (2011). Dynamic strategic asset allocation: Risk and return across economic regimes. Journal of Asset Management, 12(5):360–375.

Blume, M. and Stambaugh, R. (1983). Biases in computed returns: An application to the size related effect. Journal of Financial Economics, 12(3):387–404.

Bouchaud, J.-P. (2013). The Endogenous Dynamics of Markets: Price Impact, Feedback Loops and Instabilities, chapter 15. Risk Books, second edition.

Bouchaud, J.-P., Laloux, L., Miceli, A., and Potters, M. (2007). Large Dimension Forecasting Models and Random Singular Value Spectra. The European Physical Journal B, 55(2):201–207.

Bouchaud, J.-P. and Potters, M. (2009). Theory of Financial Risk and Derivative Pricing: From Statis- tical Physics to Risk Management. Cambridge University Press, second edition.

Bouchaud, J.-P. and Potters, M. (2011). Financial Applications of Random Matrix Theory: a short review, chapter 40. Oxford University Press.

Bouchaud, J.-P., Potters, M., Laloux, L., Ciliberti, S., Lemperi´ ere,` Y., Beveratos, A., and Simon, G. (2016). Deconstructing the Low-Vol anomaly.

Bowley, A. (1920). Elements of Statistics. Scribner’s.

Box, G., Jenkins, G., Reinsel, G., and Ljung, G. (2015). Time series analysis: forecasting and control. Probability and Statistics. John Wiley & Sons, fifth edition.

Boyer, B., Mitton, T., and Vorkink, K. (2010). Expected Idiosyncratic Skewness. Review of Financial Studies, 23(1):169–202.

140 Braun, P., Nelson, D., and Sunier, A. (1995). Good News, Bad News, Volatility and Betas. The Journal of Finance, 50(5):1575–1603.

Brealey, R., Myers, S., and Allen, F. (2003). Principles of Corporate Finance. McGraw-Hill, New York, 7th edition.

Briere,` M., Lehalle, C.-A., Nefedova, T., and Raboun, A. (2019). Stock Market Liquidity and the Trad- ing Costs of Asset Pricing Anomalies. https://papers.ssrn.com/sol3/papers.cfm? abstract_id=3380239.

Brinson, G. P., Hood, L. R., and Beebower, G. L. (1986). Determinants of portfolio performance. Financial Analysts Journal, 42(4):39–44.

Brown, S., Goetzmann, W., and Ibbostson, R. (1999). Offshore Hedge Funds : Survival and Perfor- mance. Journal of Business, 72:91–117.

Bryan, A., Choy, J., Johnson, B., Lamont, K., and Prineas, A. (2019). A global guide to strategic-beta exchange-traded products 2019. Technical report, Morningstar.

Bucci, F., Matromatteo, I., Eisler, Z., Lillo, F., Bouchaud, J.-P., and Lehalle, C.-A. (2018). Co- impact: Crowding effects in institutional trading activity. https://papers.ssrn.com/sol3/ papers.cfm?abstract_id=3168684.

Bun, J., Bouchaud, J.-P., and Potters, M. (2016). On the overlaps between eigenvectors of correlated random matrices. https://arxiv.org/abs/1603.04364.

Cahan, R. and Luo, Y. (2013). Standing out from the crowd: Measuring crowding in quantitative strate- gies. The Journal of Portfolio Management, 39:14–23.

Calluzzo, P., Moneta, F., and Topaloglu, S. (2015). Information travels quickly, institutional investors re- act quickly, and anomalies decay quickly. http://risk2016.institutlouisbachelier. org/SharedFiles/PAPER_CALLUZZO.pdf.

Carhart, M. (1997). On Persistence in Mutual Fund Performance. The Journal of Finance, 52(1):57–82.

Cazalet, Z. and Roncalli, T. (2014). Facts and fantasies about factor investing. www.ssrn.com/ abstract=2524547.

Cerniglia, J. and Fabozzi, F. J. (2018). Academic, practitioner, and investor perspectives on factor investing. The Journal of Portfolio Management, 44(4):10–16.

Chan, K. and Chen, N. (1988). An unconditional asset-pricing test and the role of firm size as an instrumental variable for risk. The Journal of Finance, 43(2):309–325.

Chan, K. and Chen, N. (1991). Structural and return characteristics of small and large firms. The Journal of Finance, 46(4):1467–1484.

Chan, K., Chen, N., and Hsieh, D. (1985). An exploratory investigation of the firm size effect. Journal of Financial Economics, 14(3):451–471.

Chan, L., Karceski, J., and Lakonishok, J. (2000). New paradigm or same old hype in equity investing. Financial Analysis Journal, pages 23–36.

Chan, L. K. C., Hamao, Y., and Lakonishok, J. (1991). Fundamentals and stock returns in japan. The Journal of Finance, 46(5):1739–64.

141 Chen, A. and Velikov, M. (2017). Accounting for the anomaly zoo: A trading cost perspective.

Chen, L., Jiang, G., Xu, D., and Yao, T. (2012). Dissecting the idiosyncratic anomaly.

Chen, L. and Zhao, X. (2009). Understanding the value and size premia: What can we learn from stock migrations?

Chen, N., Roll, R., and Ross, S. (1986). Economic forces and the stock market. Journal of Business, 59(3):383–403.

Cho, T. (2019). Turning Alpha into Betas: Arbitrage and Endogeneous Risk. https://papers. ssrn.com/sol3/papers.cfm?abstract_id=3430041.

Chopra, V. and Ziemba, W. T. (1993). The Effect of Errors in Means, Variances, and Covariances on Optimal Portfolio Choice. The Journal of Portfolio Management, 19(2):6–11.

Chordia, T., Goyal, A., and Saretto, A. (2017). p-hacking: Evidence from two million trading strategies. https://www.baylor.edu/business/lonestarconference/doc.php/ 292031.pdf.

Chordia, T., Subrahmanyam, A., and Tong, Q. (2014). Have capital market anomalies attenuated in the recent era of high liquidity and trading activity? Journal of Accounting and Economics, 58(1):41–58.

Chow, T.-m., Hsu, J., Kalesnik, V., and Little, B. (2011). A Survey of Alternative Equity Index Strate- gies. Financial Analysts Journal, 67(5):37–57.

Christie, A. and Hertzel, M. (1981). Capital asset pricing ”anomalies”: Size and other correlations.

Ciliberti, S., Seri´ e,´ E., Simon, G., Lemperi´ ere,` Y., and Bouchaud, J.-P. (2019). The size premium in equity markets: Where is the risk? The Journal of Portfolio Management.

Clare, A., Motson, N., and Thomas, S. (2013a). An evaluation of alternative equity indices - Part 1 : Heuristic and optimised weighting schemes. Technical report, Cass Business School.

Clare, A., Motson, N., and Thomas, S. (2013b). An Evaluation of Alternative Equity Indices - Part 2 : Fundamental weighting schemes. Technical report, Cass Business School.

Clarke, R., De Silva, H., and Thorley, S. (2006). Minimum Variance Portfolios in the U.S. Equity Market. The Journal of Portfolio Management, 33(1):10–24.

Clemens, M. (2012). Dividend investing: Strategy for long-term outperformance.

Cochrane, J. (1991). Volatility tests and efficient markets : A review essay. Journal of Monetary Economics, 27(3):463–485.

Cochrane, J. (1996). A Cross-Sectional Test of an Investment-Based Asset Pricing Model. Journal of Political Economy, 104.

Cochrane, J. (1999). Portfolio Advice for a Multifactor World. Economic Perspectives, 23(3):59–78.

Cochrane, J. (2011). Presidential address: Discount rates. The Journal of Finance, 66(4):1047–1108.

Connor, G. (1995). The three types of factor models: A comparison of their explanatory power. Finan- cial Analysts Journal, 51(3):42–46.

Connor, G. and Korajczyk, R. (1993). A test for the number of factors in an approximate factor model. The Journal of Finance, 48(4):1263–1291.

142 Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1:223–236.

Cont, R. (2007). Volatility Clustering in Financial Markets: Empirical Facts and Agent-Based Models, pages 289–309. Springer, Berlin, Heidelberg.

Crain, M. (2011). A literature review of the size effect. https://papers.ssrn.com/sol3/ papers.cfm?abstract_id=1710076.

Cremers, M. and Petajisto, A. (2009). How active is your fund manager? a new measure that predicts performance. Review of Financial Studies, 22:3329–3365.

D’Agostino, R. and Pearson, E. (1973). Tests for departure from normality. Biometrika, 60(3):613–622.

Daniel, K., Grinblatt, M., Titman, S., and Wermers, R. (1997). Measuring Mutual Fund Performance with Characteristic-Based Benchmarks. The Journal of Finance, 52(3):1035–1058.

Daniel, K. and Moskowitz, T. (2016). Momentum crashes. Journal of Financial Economics, forthcom- ing, 122(2):221–247.

Daniel, K. and Titman, S. (1997). Evidence on the Characteristics of Cross Sectional Variation in Stock Returns. The Journal of Finance, 52(1):1–33.

De Bondt, W. and Thaler, R. (1987). Further evidence on investor overreaction and stock market sea- sonality. The Journal of Finance, 42(3):557–81.

Demey, P., Maillard, S., and Roncalli, T. (2010). Risk-Based Indexation. www.lyxor.com.

Dichev, I. (1998). Is the risk of bankruptcy a systematic risk? The Journal of Finance, 53(3):1131–1147.

Diebold, F. (2006). Elements of Forecasting. Cengage Learning, fourth edition.

Dimson, E. and Marsh, P. (1999). Murphy’s law and market anomalies. The Journal of Portfolio Management, 25(2):53–69.

Dimson, E., Marsh, P., and Staunton, M. (2002). Triumph of the Optimists: 101 Years of Global Invest- ment Returns. Princeton University Press.

Dimson, E., Marsh, P., and Staunton, M. (2011). Investment style: Size, value and momentum, pages 41–54. Credit Suisse Research Institute, Zurich.

Dimson, E., Marsh, P., and Staunton, M. (2017). Factor-based investing: the long-term evidence. The Journal of Portfolio Management, 43(5):15–37.

Ding, Z., Granger, C., and Engle, R. (1993). A long memory property of stock market returns and a new model. Journal of Empirical Finance, 1(1):83–106.

Doole, S., Bayraktar, M., Kassam, A., and Radchenko, S. (2015). Lost in the crowd? identifying and measuring crowded strategies and trades. Technical report, MSCI.

Doornik, J. and Hansen, H. (2008). An omnibus test for univariate and multivariate normality. Oxford Bulletin of Economics and Statistics, 70(1):927–939.

Durlauf, S. N., Blume, L., et al. (2008). The new Palgrave dictionary of economics, volume 6. Palgrave Macmillan Basingstoke.

143 Engle, R. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of united kindom inflation rate. Econometrica, 50:987–1007.

Engle, R. and Bollerslev, T. (1986). Modelling the persistence of conditional variances. Econometric Reviews, 5(1):1–50.

Engle, R. and Ferstenberg, R. (2007). Execution Risk. The Journal of Portfolio Management, 33(2):34– 45.

Engle, R., Ferstenberg, R., and Russell, J. (2012). Measuring and modeling execution cost and risk. The Journal of Portfolio Management, 38(2):14–28.

Esakia, M., Goltz, F., Sivasubramanian, S., and Ulahel, J. (2017). Smart Beta Replication Costs. https://risk.edhec.edu/publications/smart-beta-replication-costs.

Evans, J. and Archer, S. (1968). Diversification and the Reduction of Dispersion: An Empirical Analysis. The Journal of Finance, 23(5):761–767.

Fabozzi, F., Foccardi, S., and Kolm, P. (2008). Financial Modeling of the Equity Market: From CAPM to Cointegration. Wiley.

Fama, E. (1965). The behavior of stock-market prices. Journal of Business, 38(1):34–105.

Fama, E. and French, K. (1992). The cross-section of expected stock returns. The Journal of Finance, 47(2):427–465.

Fama, E. and French, K. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1):3–56.

Fama, E. and French, K. (1995). Size and book-to-market factors in earnings and returns. The Journal of Finance, 50(1):131–155.

Fama, E. and French, K. (1996). Multifactor explanations of asset pricing anomalies. The Journal of Finance, 51(1):55–84.

Fama, E. and French, K. (2007). Migration. Financial Analysts Journal, 63(3):48–58.

Fama, E. and French, K. (2008). Dissecting anomalies. The Journal of Finance, 63(4):653–1678.

Fama, E. and French, K. (2010). Luck versus skill in the cross-section of mutual fund returns. The Journal of Finance, 65(5):1915–1947.

Fama, E. and French, K. (2012). Size, value, and momentum in international stock returns. Journal of Financial Economics, 105(3):457–472.

Fama, E. and French, K. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1):1–22.

Fama, E. and MacBeth, J. (1973). Risk, Return, and Equilibrium: Empirical Tests. The Journal of Political Economy, 81(3):607–636.

Fama, E. F. (1998). Market efficiency, long-term returns, and behavioral finance. Journal of financial economics, 49(3):283–306.

Fama, E. F. and French, K. R. (2018). Choosing factors. Journal of Financial Economics, 128(2):234– 252.

144 Feng, G., Giglio, S., and Xiu, D. (2019). Taming the factor zoo: A test of new factors. Working Paper 25481, National Bureau of Economic Research.

Fisher, L. and Lorie, J. H. (1970). Some studies of variability of returns on investments in common stocks. The Journal of Business, 43(2):99–134.

Florens, J.-P., Marimoutou, V., and Peguin-Feissolle, A. (2007). Econometric Modeling and Inference. Cambridge University Press.

Foucault, T., Pagano, M., and Roell, A. (2013). Market Liquidity: Theory, Evidence, and Policy. Oxford University Press.

Frazzini, A., Israel, R., and Moskowitz, T. (2018). Trading costs. https://papers.ssrn.com/ sol3/papers.cfm?abstract_id=3229719.

Frazzini, A. and Pedersen, L. (2014). Betting against beta. Journal of Financial Economics, 111:1–25.

Frost, P. and Savarino, J. (1986). An empirical Bayes approach to efficient portfolio selection. Journal of Financial and Quantitative Analysis, 21:293–305.

Frost, P. and Savarino, J. (1988). For better performance: Constrain portfolio weights. The Journal of Portfolio Management, 15(1):29–34.

Fuhr, D. and Kelly, S. (2011). Etf landscape industry review. Technical report, BlackRock.

Fung, W. and Hsieh, D. (1997). Survivorship Bias and Investment Style in the Return of CTAs. The Journal of Portfolio Management, 24(1):30–41.

Gabaix, X. (2008). Power laws in economics and finance. Working Paper 14299, National Bureau of Economic Research.

Gander, P., Leveau, D., and Pfiffner, T. (2012). Categorization of indices: Do all roads lead to rome? Journal of Index Investing, 3(3):12–17.

Garman, M. and Klass, M. (1980). On the Estimation of Security Price Volatilities from Historical Data. Journal of Business, 53(1):67–78.

Garrett, T. and Sobel, R. (1999). Gamblers favor skewness, not risk: Further evidence from United States’ lottery games. Economics Letters, 63(1):85–90.

Geczy, C. and Samonov, M. (2016). Two centuries of price return momentum. Financial Analysts Journal, 72.

Giardina, I. and Bouchaud, J. (2003). Bubbles, crashes and intermittency in agent based market models. The European Physical Journal B-Condensed Matter and Complex Systems, 31(3):421–437.

Glosten, L., Jagannathan, R., and Runkle, D. (1992). On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. The Journal of Finance, 48(5):1779–1801.

Gourieroux,´ C., Scaillet, O., and Szafarz, A. (1997). Econometrie´ de la finance, analyses historiques. Economie et Statistiques Avancees.´ Economica.

Graham, B. and Dodd, D. L. (1934). Security analysis. New York: Whittlesey House, McGraw-Hill Book Co.

Grant, J. and Ahmerkamp, J. (2013). The returns to carry and momentum strategies: Business cycles, hedge fund capital and limits to arbitrage.

145 Green, J., Hand, J., and Zhang, F. (2013). The remarkable multidimensionality in the cross section of expected us stock returns. https://pdfs.semanticscholar.org/0b3f/ d0250975db6a559052bbd3ec07fce245f3bb.pdf.

Greenwood, R. and Thesmar, D. (2011). Stock price fragility. Journal of Financial Economics, 102(3):471–490.

Grinold, R. and Kahn, R. (2000). Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk. McGraw-Hill, second edition.

Gueant,´ O. (2016). Optimal Execution and Liquidation in Finance. Financial Mathematics. Chapman and Hall/CRC.

Hamdan, R., Pavlowsky, F., Roncalli, T., and Zheng, B. (2016). A primer on alternative risk premia. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2766850.

Hanson, S. G. and Sunderam, A. (2013). The growth and limits of arbitrage: Evidence from short interest. The Review of Financial Studies, 27(4):1238–1286.

Hardle,¨ W. and Simar, L. (2015). Applied Multivariate Statistical Analysis. Springer-Verlag, Berlin, fourth edition.

Harvey, C., Liu, Y., and Zhu, H. (2016). ... and the cross-section of expected returns. Review of Financial Studies, 29(1):5–68.

Harvey, C. and Siddiqui, A. (1999). Autoregressive conditional skewness. Journal of Financial and Quantitative Analysis, 34(04):465–487.

Harvey, C. and Siddiqui, A. (2000). Conditional skewness in asset pricing tests. The Journal of Finance, 55(3):1263–1295.

Hasbrouck, J. (2007). Empirical Market Microstructure: The Institutions, Economics, and Econometrics of Securities Trading. Oxford University Press, first edition.

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning. Springer, second edition.

Haugen, R. and Baker, N. (1991). The efficient market inefficiency of capitalization-weighted stock portfolios. The Journal of Portfolio Management, 17(3):35–40.

Haugen, R. and Baker, N. (2012). Low Risk Stocks Outperform within All Observable Markets of the World. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2055431.

Haugen, R. and Heins, A. (1975). Risk and the Rate of Return on Financial Assets: Some Old Wine in New Bottles. The Journal of Financial and Quantitative Analysis, 10(5):775–784.

Hendricks, D., Patel, J., and Zeckhauser, R. (1993). Hot hands in mutual funds: Short-run persistence of relative performance, 1974-1988. The Journal of Finance, 48(1):93–130.

Hirshleifer, D. (2001). Investor psychology and asset pricing. Mpra paper, University Library of Munich, Germany.

Hirshleifer, D., Hou, K., and Teoh, S. H. (2012). The accrual anomaly: risk or mispricing? Management Science, 58(2):320–335.

146 Horowitz, J., Loughran, T., and Savin, N. (2000a). The disappearing size effect. Research in Economics, 54(1):83–100.

Horowitz, J., Loughran, T., and Savin, N. (2000b). Three analyses of the firm size premium. Journal of Empirical Finance, 7(2):143–153.

Hotteling, H. and Solomons, L. M. (1932). The Limits of a Measure of Skewness. Annals of Mathemat- icals Statistics, 3(2):141–142.

Hou, K. and Loh, R. K. (2016). Have we solved the idiosyncratic volatility puzzle? Journal of Financial Economics, 121(1):167–194.

Hou, K., Xue, C., and Zhang, L. (2014). Digesting Anomalies: An Investment Approach. The Review of Financial Studies, 28(3):650–705.

Hou, K., Xue, C., and Zhang, L. (2017a). A comparison of new factor models.

Hou, K., Xue, C., and Zhang, L. (2017b). Replicating anomalies. Working Paper 23394, National Bureau of Economic Research.

Huberman, G. and Kandel, S. (1987). Mean-variance spanning. Journal Finance, 42(4):873–888.

Huij, J., Lansdorp, S., Blitz, D., and Vliet, P. (2014). Factor investing: Long-only versus long-short. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2417221.

Hwang, S. and Satchell, S. (1999). Modelling emerging market risk premia using higher moments. International Journal of Finance and Economics, 4(4):271–296.

Ielpo, F., Merhy, C., and Simon, G. (2017). Engineering Investment Process: Making Value Creation Repeatable. Elsevier Science.

Ilmanen, A., Israel, R., Moskowitz, T., Thapar, A., and Wang, F. (2019). Factor premia and factor timing: A century of evidence. Working paper, AQR.

Invesco (2016). Basic concepts for understanding factor investing. Working papers, Invesco.

Israel, R., Jiang, S., and Ross, A. (2017). Craftsmanship alpha: An application to style investing. The Journal of Portfolio Management, 44(2):23–39.

Iwasawa, S. and Ushiwama, T. (2013). A Behavioral Economics Exploration into the Volatility Anomaly. Public Policy Review, 9(3):457–490.

Jarque, C. and Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and serial indepen- dence of regression residuals. Economics Letters, 6(3):255–259.

Jegadeesh, N. and Titman, S. (1993). Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency. The Journal of Finance, 48(1):65–91.

Jennrich, R. (1970). An Asymptotic Chi2-Test for the Equality of Two Correlation Matrices. Journal of the American Statistical Association, 65(330):904–912.

Jensen, M. (1968). The Performance of Mutual Funds in the Period 1945-1964. The Journal of Finance, 23(2):389–416.

Jiang, G., Yao, T., and Yu, T. (2007). Do Mutual Funds Time the Market? Evidence from Portfolio Holdings. Journal of Financial Economics, 86:724–758.

147 Jobson, J. and Korkie, R. (1980). Estimation for Markowitz efficient portfolios. Journal of the American Statistical Association, 75:544–554.

Jobson, J. and Korkie, R. (1981a). Performance hypothesis testing with the Sharpe and Treynor mea- sures. The Journal of Finance, 36(4):889–908.

Jobson, J. and Korkie, R. (1981b). Putting Markowitz theory to work. The Journal of Portfolio Man- agement, 7(4):70–74.

Johnson, B., Boccellari, T., Bryan, A., and Rawson, M. (2015). Morningstar’s active/passive barometer, a new yardstick for an old debate. Technical report, Morningstar.

Johnson, N. and Welch, B. (1940). Applications of the non-central t-distribution. Biometrika, 3- 4(31):362–389.

Jondeau, E., Poon, S., and Rockinger, M. (2007). Financial Modeling Under Non-Gaussian Distribu- tions. Springer Finance. Springer-Verlag, London.

Jorion, P. (1986). Bayes-Stein Estimation for Portfolio Analysis. Journal of Financial and Quantitative Analysis, 21:279–292.

Jorion, P. (1988). On jump processes in the foreign exchange and stock markets. Review of Financial Studies, 1(4):427–445.

Kacperczyk, M., Sialm, C., and Zheng, L. (2005). On the industry concentration of actively managed equity mutual funds. The Journal of Finance, 60(4):1983–2011.

Karatzas, I. and Shreve, S. (1998). Brownian Motion and Stochastic Calculus. Springer-Verlag, New- York, second edition.

Keim, D. (1983). Size-related anomalies and stock return seasonality: Further empirical evidence. Journal of Financial Economics, 12(1):13–32.

Keim, D. (2003). The cost of trend chasing and the illusion of momentum profits.

Keim, D. and Madhavan, A. (1997). Transactions costs and investment style: an inter-exchange analysis of institutional equity trades. Journal of Financial Economics, 46(3):265–292.

Kendall, M. and Stuart, A. (1977). The Advanced Theory of Statistics, volume 1. Griffin, London.

Khandani, A. and Lo, A. (2011). What happened to the quants in August 2007? Evidence from factors and transactions data. Journal of Financial Markets, 14(1):1–46.

Kim, T. and White, H. (2004). On more robust estimation of skewness and kurtosis. Finance Research Letters, 1:56–73.

Kissell, R. and Glantz, M. (2003). Optimal Trading Strategies: Quantitative Approaches for Managing Market Impact and Trading Risk. Amacom.

Knez, P. and Ready, M. (1997). On the robustness of size and book-to-market in cross-sectional regres- sions. The Journal of Finance, 52(4):1355–82.

Koijen, R., Moskowitz, T., Pedersen, L., and Vrugt, E. (2018). Carry. Journal of Financial Economics, 127(2):197 – 225.

Korajczyk, R. (1999). Asset Pricing and Portfolio Performance: Models, Strategy and Performance Metrics. Risk Books.

148 Korajczyk, R. and Sadka, R. (2004). Are momentum profits robust to trading costs? The Journal of Finance, 59:1039–1082.

Kosowski, R., Timmermann, A., Wermers, R., and White, H. (2006). Can Hedge Funds ”Stars” Really Pick Stocks? New Evidence from a Bootstrap Analysis. The Journal of Finance, 61(6):2551–2595.

Lakonishok, J., Shleifer, A., and Vishny, R. (1994). Contrarian investment, extrapolation, and risk. The Journal of Finance, 49(5):1541–1578.

Laloux, L., Cizeau, P., Bouchaud, J.-P., and Potters, M. (1999). Noise Dressing of Financial Correlation Matrices. Physical Review Letters, 83(7):1467–1470.

Lamoureux, C. and Sanger, G. (1989). Firm Size and Turn-of-the-Year Effects in the OTC/NASDAQ Market. The Journal of Finance, 44(5):1219–1245.

Landier, A., Simon, G., and Thesmar, D. (2015). The Capacity of Trading Strategies. www.ssrn. com/abstract=2585399.

Larntz, K. and Perlman, M. (1988). A simple test for the equality of correlation matrices, volume 2, pages 289–298. Springer-Verlag, New-York.

Ledoit, O. and Wolf, M. (2002). Some Hypothesis Tests for the Covariance Matrix when the Dimension is Large Compared to the Sample Size. Annals of Statistics, 30(4):1081– 1102.

Ledoit, O. and Wolf, M. (2008). Robust performance hypothesis testing with the Sharpe ratio. Journal of Empirical Finance, 15(5):850–859.

Lehalle, C.-A. and Laruelle, S. (2013). Market Microstructure in Practice. World Scientific Publishing.

Lehalle, C.-A. and Simon, G. (2019). Portfolio selection with active strategies: How long only con- straints shape convictions. Available at SSRN 3405228.

Leinweber, D. J. (2007). Stupid Data Miner Tricks, Overfitting the S&P 500. Journal of Investing, 16(1):15–22.

Lemperi´ ere,` Y., Deremble, C., Nguyen, T.-T., Seager, P., Potters, M., and Bouchaud, J.-P. (2015). Risk Premia: Asymmetric Tail Risks and Excess Return. https://arxiv.org/abs/1409.7720.

Lemperiere, Y., Deremble, C., Seager, P., Potters, M., and Bouchaud, J.-P. (2014). Two centuries of trend following. The Journal of Investment Strategies, 3.

Lesmond, D., Schill, M., and Zhou, C. (2004). The illusory nature of momentum profits. Journal of Financial Economics, 71:349–380.

Lezmi, E., Malongo, H., Roncalli, T., and Sobotka, R. (2018). Portfolio allocation with skewness risk: A practical guide. Available at SSRN 3201319.

Li, X., Sullivan, R., and Garcia-Feijoo,´ L. (2014). The Limits to Arbitrage and the Low-Volatility Anomaly. Financial Analysts Journal, 70(1):52–63.

Linnainmaa, J. and Roberts, M. (2016). The history of the cross section of stock returns. Working Paper 22894, National Bureau of Economic Research.

Lintner, J. (1965). The Valuation of Risk Assets and the Selection of Risky Investments in Stock Port- folios and Capital Budgets. The Review of Economics and Statistics, 47(1):13–37.

149 Lo, A. (1991). Long-term memory in stock market prices. Econometrica, 59(5):1279–1313.

Lo, A. (2002). The statistics of Sharpe Ratios. Financial Analysts Journal, 58(4):36–52.

Lo, A. (2004). The Adaptative Market Hypothesis: Market Efficiency from an Evolutionary Perspective. The Journal of Portfolio Management, 30(5):15–29.

Lo, A. (2008). Where Do Alphas Come From?: A Measure of the Value of Active Investment Manage- ment. Journal of Investment Management, 6(2):1–29.

Lo, A. (2016). What is an Index? The Journal of Portfolio Management, 42(2):21–36.

Lo, A. and MacKinlay, A. (1990). Data-snooping biases in tests of financial asset pricing models. Review of Financial Studies, 3:431–467.

Lou, D. and Polk, C. (2012). Comomentum: Inferring arbitrage activity from return correlations. SSRN Electronic Journal.

Malevergne, Y. and Sornette, D. (2004). Collective Origin of the Coexistence of Apparent RMT Noise and Factors in Large Sample Correlation Matrices. Physica A: Statistical Mechanics and its Applica- tions, 331(3-4):660–668.

Malkiel, B. (2014). Is smart beta really smart? The Journal of Portfolio Management, 40:127–134.

Malkiel, B. G. (2003). The efficient market hypothesis and its critics. Journal of economic perspectives, 17(1):59–82.

Mandelbrot, B. (1963). The variation of certain speculative prices. Journal of Business, 36:394.

Markowitz, H. (1952). Portfolio selection. The Journal of Finance, 7(1):77–91.

Markowitz, H. (1959). Portfolio Selection: Efficient Diversification of Investments. Basil Blackwell, Cambridge, MA.

Marcenko,ˇ V. and Pastur, L. (1967). Distribution of Eigenvalues of Some Sets of Random Matrices. Math. USSR-Sbornik, 1(4):457–483.

McLean, D. and Pontiff, J. (2016). Does Academic Research Destroy Stock Return Predictability? The Journal of Finance, 71(1):5–32.

Mehta, M. (2004). Random Matrices. Pure and Applied Mathematics. Elsevier, third edition.

Merton, R. (1973). An intertemporal capital asset pricing model. Econometrica, 41(5):867–87.

Merton, R. (1980). On Estimating the Expected Return on the Market: An Exploratory Investigation. Journal of Financial Economics, 8(4):323–361.

Meucci, A. (2014). Linear factor models: Theory, applications and pitfalls. www.ssrn.com/ abstract=1635495.

Meucci, A., Santangelo, A., and Deguest, R. (2015). Risk budgeting and diversification based on opti- mized uncorrelated factors. Available at SSRN 2276632.

Michaud, R. (1989). The Markowitz Optimization Enigma: Is “Optimized” Optimal. Financial Analysts Journal, 45:31–42.

150 Miller, R. and Gehr, A. (1978). Sample size bias and Sharpe’s performance measure: A note. Journal of Financial and Quantitative Analysis, 13(5):943–946.

Mittoo, U. and Thompson, R. (1990). Do capital markets learn from financial economists?

Moghtader, P. (2017). Distilling the risks of smart beta. Lazard insights, Lazard.

Monasson, R. and Villamaina, D. (2015). Estimating the principal components of correlation matrices from all their empirical eigenvectors. Europhysics Letters, 112(5):50001.

Montier, J. (2007). CAPM is CRAP (or, The Dead Parrot Lives!), chapter 35. Wiley.

Moors, J. (1988). A quantile alternative for kurtosis. The Statistician, 37(1):25–32.

Moro, E., Vicente, J., Moyano, L., Gerig, A., Farmer, D., Vaglica, G., Lillo, F., and Mantegna, R. (2009). Market impact and trading profile of hidden orders in stock markets. Physical Review E, 80:066102.

Mossin, J. (1966). Equilibrium in a Capital Asset Market. Econometrica, 34(4):768–783.

Muijsson, C., Fishwick, E., and Satchell, S. (2015). The low beta anomaly and interest rates.

Muirhead, R. (1987). Developments in Eigenvalue Estimation. In Gupta, A., editor, Advances in Multi- variate Statistical Analysis, volume 5 of Theory and Decision Library, pages 277–288. Springer.

Nelson, D. (1991). Conditional Heteroskedasticity in Asset Returns: A New Approach. Econometrica, 59(2):347–370.

Novy-Marx, R. (2013). The other side of value: The gross profitability premium. Journal of Financial Economics, 108(1):1–28.

Novy-Marx, R. (2014). Understanding defensive equity.

Novy-Marx, R. and Velikov, M. (2016). A taxonomy of anomalies and their trading costs. Review of Financial Studies, 29:104–147.

Opdyke, J. (2007). Comparing Sharpe ratios: so where are the p-values? Journal of Asset Management, 8(5):308–336.

Pafka, S. and Kondor, I. (2004). Estimated Correlation Matrices and Portfolio Optimization. Physica A-Statistical Mechanics and its Applications, 343:623–634.

Parkinson, M. (1980). The extreme value method for estimating the variance of the rate of return. Journal of Business, 53(1):61–65.

Pastur, L. and Shcherbina, M. (2011). Eigenvalue Distribution of Large Random Matrices. Mathematical Surveys and Monographs. American Mathematical Society.

Patton, A. and Weller, B. (2019). What You See is Not What You Get: The Costs of Trading Market Anomalies. https://papers.ssrn.com/sol3/papers.cfm?abstract_id= 3034796.

Perold, A. (2004). The Capital Asset Pricing Model. Journal of Economic Perspectives, 18(3):3–24.

Perold, A. (2007). Fundamentally Flawed Indexing. Financial Analysts Journal, 63(6):31–37.

Piotroski, J. (2000). Value investing: The use of historical financial statement information to separate winners from losers. Journal of Accounting Research, 38:1–41.

151 Plerou, V., Gopikrishnan, P., Rosenow, B., Amaral, L., and Stanley, E. (2002). A Random Matrix Approach to Cross-Correlations in Financial Data. Physical Review Letter E, 65(6):066126.

Polk, C., Haghbin, M., and de Longis, A. (2019). Time-Series Variation in Factor Premia: The Influence of the Business Cycle. https://papers.ssrn.com/sol3/papers.cfm?abstract_id= 3377677.

Potters, M., Bouchaud, J.-P., and Laloux, L. (2005). Financial Applications of Random Matrix Theory: Old Laces and New Pieces. Acta Physica Polonica B, 36(9):623–634.

Pukthuanthong, K. and Roll, R. (2015). A Protocol for Factor Identification. www.ssrn.com/ abstract=2342624.

Reinganum, M. R. (1981). Misspecification of capital asset pricing: Empirical anomalies based on earnings’ yields and market values. Journal of Financial Economics, 9(1):19–46.

Richardson, M. and Smith, T. (1993). A test for multivariate normality in stock returns. Journal of Business, 66(2):295–321.

Rogers, L. and Satchell, S. (1991). Estimating variance from high, low and closing prices. Annals of Applied Probability, 1(4):504–512.

Roll, R. (1981). A possible explanation of the small firm effect. The Journal of Finance, 36(4):879–88.

Roncalli, T. (2013). Introduction to Risk Parity and Budgeting. Financial Mathematics. Chapman & Hall/CRC.

Roncalli, T. (2017). Alternative risk premia: What do we know? Technical report, Amundi.

Rosenberg, B., Reid, K., and Lanstein, R. (1985). Persuasive evidence of market inefficiency. The Journal of Portfolio Management, 11(3):9–16.

Ross, S. (1976). The arbitrage theory of capital asset pricing. Journal of Economic Theory, 13(3):341– 360.

Rouwenhorst, K. (1998). International Momentum Strategies. The Journal of Finance, 53:267–284.

Rozeff, M. and Kinneys, W. (1976). Capital market seasonality: the case of stock returns. Journal of Financial Economics, 3:379–402.

Schwert, G. (1983). Size and stock returns, and other empirical regularities. Journal of Financial Economics, 12(1):3–12.

Schwert, G. (2003). Anomalies and market efficiency. In Constantinides, G., Harris, M., and Stulz, R. M., editors, Handbook of the Economics of Finance, volume 1 of Handbook of the Economics of Finance, chapter 15, pages 939–974. Elsevier.

Schwert, W. (1989). Why Does Stock Market Volatility Change over Time? The Journal of Finance, 44(5):1115–1153.

Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. The Journal of Finance, 19(3):425–442.

Sharpe, W. (1966). Mutual Fund Performance. Journal of Business, 39(1):119–138.

Shores, S. (2015). Smart beta : Defining the opportunities and solutions. Working paper, BlackRock.

152 Sloan, R. (1996). Do Stock Prices Fully Reflect Information in Accruals and Cash Flows about Future Earnings. The Accounting Review, 71(3):289–315.

Stanzl, W., Chen, Z., and Watanabe, M. (2006). Price impact costs and the limit of arbitrage. Yale School of Management, Yale School of Management Working Papers.

Sullivan, R., Timmermann, A., and White, H. (1999). Data-Snooping , Technical Trading Rule Perofr- mance, and the Bootstrap. The Journal of Finance, 54(5):1647–1691.

Terasvirta,¨ T. and Zhao, Z. (2011). Stylized facts of return series, robust estimates and three popular models of volatility. Applied financial economics, 21(1-2):67–94.

Thode, H. (2002). Testing For Normality, volume 164 of Statistics. CRC Press.

Thompson, J., Baggett, L., Wojciechowski, W., and Williams, E. (2006). Nobels for nonsense. Journal of Post Keynesian Economics, 29(1):3–18.

Treynor, J. (1961). Market Value, Time, and Risk. Unpublished Manuscript.

Treynor, J. (2007). Toward a Theory of Market Value of Risky Assets, chapter 6. Wiley.

Uechi, L., Aktsu, T., Stanley, E., Marcus, A., and Kenett, D. (2015). Sector dominance ratio analysis of financial markets. Physica A: Statistical Mechanics and its Applications, 421:488–509.

Ung, D., Fernandes, R., and Hahn, B. (2015). SPIVA Europe Scorecard, Year-End 2015. Technical report, S&P Dow Jones Indices. van Dijk, M. (2011). Is size dead? a review of the size effect in equity returns. Journal of Banking & Finance, 35(12):3263–3274.

Vassalou, M. and Wing, Y. (2004). Default risk in equity returns. The Journal of Finance, 59(2):831– 868.

Vayanos, D. and Woolley, P. (2013). An Institutional Theory of Momentum and Reversal. Review of Financial Studies, 26(5):1087–1145.

Wermers, R., Yao, T., and Zhao, J. (2012). Forecasting Stock Returns Through an Efficient Aggregation of Mutual Fund Holdings. Review of Financial Studies, 25(12):3490–3529.

Wimmer, B., Chhabra, S., and Wallick, D. (2013). The bumpy road to outperformance. Technical report, Vanguard Research.

Winkelmann, K., Suryanarayanan, R., Hentschel, R., and Varga, K. (2013). Macro Sensitive Portfolio Strategies: Macroeconomic Risk and Asset Cash Flows. Msci market insight, MSCI.

Yang, D. and Zhang, Q. (2000). Drift-independent volatility estimation based on high, low, open, and close prices. Journal of Business, 73(3):477–91.

Zhang, L. (2005). The value premium. The Journal of Finance, 60:67–103.

Zhang, L., Mykland, P., and Ait-Sahalia, Y. (2005). A Tale of Two Time Scales: Determining Inte- grated Volatility With Noisy High-Frequency Data. Journal of the American Statistical Association, 100(472):1394–1411.

153