Beating the Market: a Quantitative Approach to Fundamental Investing

Niklas Lappe

Beating the Market: A Quantitative Approach to Fundamental Investing

Master Thesis

Chair of Entrepreneurial Risks Swiss Federal Institute of Technology (ETH) Zurich

Supervision Sumit Kumar Ram Prof. Dr. Didier Sornette

May 2020

Contents

Abstract iii

Nomenclature iv

1 Introduction 1

2Data 3 2.1 Price Data ...... 3 2.2 Fundamental Data ...... 4

3 Methodology 5 3.1 PortfolioandBenchmarkConstruction ...... 6 3.1.1 BenchmarkConstruction ...... 6 3.1.2 PortfolioConstruction ...... 7 3.2 Prediction Metrics ...... 8 3.3 Ensemble Properties of the Financial Ratios ...... 11 3.3.1 Distributions ...... 11 3.3.2 Joint Distributions ...... 15 3.3.3 Canonical Averaging of the Volatility and Return Time Series conditional on Financial Ratios ...... 18 3.4 CopulaAnalysis ...... 21 3.4.1 Introduction ...... 21 3.4.2 Calibration ...... 24 3.5 Random Forest Model ...... 29 3.5.1 Mathematical Formulation ...... 29 3.5.2 Random Forest Regressor ...... 30 3.5.3 Training and Testing Sets ...... 32

4 Results 33 4.1 ClassiﬁcationExperiment ...... 33 4.1.1 Proﬁt and Loss Prediction ...... 33 4.1.2 Outperformance and Underperformance Prediction ...... 34 4.2 Basic Experiment ...... 35 4.2.1 Fixed Training Set ...... 35 4.2.2 Dynamic Training Set ...... 38 4.3 Sensitivity Analysis ...... 41 4.3.1 Fixed Training Experiment ...... 41 4.3.2 Table Form of Fixed Training Experiment ...... 44 4.3.3 ComparisonofTrainingSets ...... 46 4.3.4 Dynamic Training Experiment ...... 50 4.3.5 IndustryPerformance ...... 53 4.4 Relative Features Experiment ...... 54

i 4.4.1 Fixed Training Set ...... 55 4.4.2 Dynamic Training Set ...... 58 4.5 Sector Features Experiment ...... 61

5 Conclusion 66

Bibliography 69

AData 72 A.1 Introduction...... 72 A.2 Data ...... 73

BDistributions 80 B.1 Joint distributions ...... 114 B.2 Canonical averaging of the volatility and return time series conditional on ﬁnancial ratios ...... 116 B.3 Copulas ...... 120

C Experiments 128 C.1 AdditionalPlotsfor4.2...... 128 C.2 AdditionalPlotsfor4.2.4 ...... 131 C.3 ClassiﬁersforRelativeFeatures ...... 134 Abstract

Financial markets have long been a playing field for both private and professional investors that aimed to generate profits from trading. Common theories imply that the search for superior return is hopeless and most investors are not able to generate returns far above the average market return. This thesis attempts to contradict this hypothesis by developing a machine learning based portfolio management algorithm with the goal of outperforming the S&P 500 index. We use a large data set that contains fundamental data for each stock in the S&P 500 for the last 30 years. We conduct an exploratory analysis of this data to generate first insights on the influences of fundamentals on stock performance. Thereafter, we apply a copula analysis to identify the strongest dependencies between fundamentals and stock returns. Eventually, we use a random forest predictor to forecast quarterly stock performances by estimating different performance indicators using the underlying fundamental data. The forecasts of the random forest predictor are then used to build a set of portfolios for different performance indicators and fundamentals. The analysis shows that some of those portfolios are indeed able to outperform the S&P 500 index. The best random forest predictor can be accepted for a significance level of 0.1% against a random predictor.

Keywords: portfolio management, fundamental investing, copulas, machine learning, random forests, eﬃcient market hypothesis, S&P 500

iii Nomenclature

Symbols

SStock MMarket PStockprice RReturn QQuarter

↵ Return diﬀerence between a security and its benchmark Covariance(RS ,RM ) Variance(RM ) LR Log Return Diﬀerence metric IR Information ratio SR Sharpe ratio

Acronyms and Abbreviations

S&P 500 Standard & Poor’s 500 Stock Index ICB Industry Classiﬁcation Benchmark RF Random Forest

Corr Pearson’s correlation coeﬃcient Tau Kendalls’s tau Rho Spearman’s rau CDF Cumulative Distribution Function PDF Probability Density Function

CAPM Capital-Asset-Pricing-Model APT Arbitrage Pricing Theory EMH Eﬃcient Market Hypothesis

iv Chapter 1

Introduction

Ever since the first share was traded on the Amsterdam stock exchange in the beginning of the 17th century, 1 both professional and private investors have been trying to generate profits from stock trading. This thesis attempts to add to the universe of investment strategies by developing a machine-learning model which is able to pick undervalued stocks and use those stocks to create outperforming portfolios. For traditional long-only trades to be profitable, an investor needs to buy an undervalued stock at alowpriceandresellitforahigherprice.Sincepresentassetpricescanbeobtainedeasily,the difficult part is estimating the price of an asset in the future. A variety of asset pricing models have been developed to solve this challenge. The most common model is the Capital-Asset-Pricing Model (Sharpe, 1964; Lintner, 1975), which states that the return of a single security solely depends on the risk-free rate, the general market return and the security’s volatility in respect to the systemic risk of the market (). It seems intuitive that an increase in beta, which can be understood as an increase in risk, should be rewarded with higher returns since there is no incentive for a rational investor to accept more risk for less reward. However, the CAPM model has been empirically proven to be insufficient for accurate asset pricing. For example, Fama and French (2004) show that there are many examples in which stocks with lower provided superior returns than their high- peers. Therefore, additions have been made to the model by critics. Black (1972) introduced the zero-beta CAPM model, which states that azero-betaportfolio’sreturncandeviatefromtherisk-freerate.Forthispurpose,hereplaced the risk-free rate with a stock-specific rate that describes the expected return of the stock for =0. Although it has been shown that this model provides better results than the CAPM model, in practice, estimating a stock-specific rate for each security is difficult. In the decades following Sharpe and Black, several economists (Rubinstein, 1973; Merton, 1973; Breeden, 2005) have adapted the model further and introduced additional components. However, the underlying problem remains: Only historical stock returns can be used as proxies for future returns in these models. Roll and Ross (1980) contradicted the CAPM model and concluded that alone is not sufficient to describe the overall price trajectory of an asset. They proposed a new model, the Arbitrage Pricing Theory, that includes more macroeconomic variables. The four variables suggested by Roll and Ross are inflation, industrial production, risk premiums and the yield curve. Each of those variables has a stock-specific sensitivity which is used to calculate the expected return of the stock. Although daily fluctuations and short-term behaviour of stocks can follow different variables, Ross claims that the long-term performance of a stock is mostly predictable by those four variables.

1The ﬁrst historically documented public company was the East India Company in 1602

1 2

Research has shown that in fact both the CAPM model and the APT model have a certain predictability of future stock returns, with the APT model outperforming the CAPM model (Chen, 1983; Groenewold Fraser, 1997). These results contradict the prevalent Efficient Market Hypothesis proposed by Fama (1970), which states that (stock-) markets reflect all available information and hence can be neither predicted nor outperformed in the long run because it is impossible to forecast new information. Thus, stock prices are often said to follow a random walk, the same way as incoming information does (Malkiel, 1999). This theory is also the foundation for the pricing of other securities, specifically derivatives that are based on underlying stocks (Black and Scholes, 1973). This thesis is based on the assumption that the EMH is incorrect otherwise there would not be any reason for pursuing further analysis and predictions in the following. The previously mentioned asset pricing models describe the relationship between a small set of specific input variables and the expected return of an asset as linear. In contrast, we will use a random forest predictor on a large set of fundamentals to predict future stock returns without applying an underlying theory like the CAPM or APT. This methodology is motivated by the fact that most “traditional” portfolio managers follow a similar approach and study fundamental data to pick stocks. The fundamental analysis dates back to Graham and Dodd (1934) and is the core of “value investing”. It is one of the most common methods and is often, especially in the long run, preferred to technical analysis. Shiller (2015) studied the influence of one of the most frequently used fundamentals, the Price-to- Earnings ratio (P/E), on future returns and found that P/E ratios are a strong indicator of good long-term stock performance (Appendix A1). Using a quantitative approach to fundamental investing is not a new idea and has been implemented by Lewellen (2004) using dividends, P/E and book-to-market ratios, and by Pontiffand Schall (1998) using only book-to-market ratios, to only name two. Both papers claim that the predictability of these fundamentals is significant. The goal of this thesis is to answer the question: Are we able to outperform the S&P 500 index using a machine learning model that predicts future stock returns? For this purpose, we increase the number of fundamentals extensively compared to Lewellen, and Pontiffand Schal to improve the predictability of future stock returns and eventually apply this to a real-life scenario by building a portfolio and comparing its performance to that of the index over the long term. After a short introduction to the data, we conduct an exploratory analysis of the data and analyse relevant trends and relationships. In the last part, we introduce the randomforest model used for the return predictions and backtest a variety of different portfolios that are based on the random-forest predictor. Chapter 2

Data

The stocks used for this project are the members of Standard & Poor’s 500 index (S&P 500) as of November 2019. The stocks are categorized by their sectors using the Industry Classiﬁcation Benchmark by Dow Jones and FTSE on a sector level (Russel, 2019). The system classiﬁes stocks into 41 sectors of which 37 sectors are included in the S&P 500 index. A detailed list can be found in Appendix A2. For each of the stocks the data set includes price and fundamental data, which were both obtained from Thomson Reuters’ Eikon Reuters.

2.1 Price Data

The price data obtained from Yahoo Finance contains daily closing prices of each stock for a 30 year period (03/10/1989 - 03/10/2019) for each day the respective stock was traded during this period. We ﬁnd two issues with the data set. First, not every stock in the index has been public for the last 30 years and therefore does not provide historical price data. Second, some stocks that have been part of the index in November 2019 have not been part of the index for the last 30 years continuously, which means that the stock set is not replicating the behaviour of the actual S&P 500 index, but of the S&P 500 index of November 2019 projected back for 30 years. To demonstrate this, we constructed an equal-weighted portfolio out of the stock set and compared its performance to that of the actual S&P 500 index. One can see that the stock set contains a survivor and new entrance bias and the portfolio based on the stock set has a better performance than the actual S&P 500 index over that period, as can be seen in Figure 2.1:

3 42.2.FundamentalData

Figure 2.1: Performance of the actual S&P 500 index over the last 30 years, compared to the performance of our stock set which includes survivorship bias. The y-axis displays the normalized prices for both portfolios, starting at 100 in a logarithmic scale.

The annualised return of the S&P 500 index over that period was 7.3%, while the equal-weighted stock set would have generated an annualised return of 13.5%. To control for that effect, the benchmarks used for the random forest portfolios are based on the stock set rather than the actual S&P 500. Besides that, the stock set is reduced to the stocks that have been part of the S&P 500 index continuously during the last 30 years to avoid missing data. After applying that rule and cleaning the data set further, the final stock set comprises 222 stocks and 34 sectors. A list of all remaining stocks can be found in the Appendix A2. The price data will be used to calculate stock returns which are used to define prediction metrics later on.

2.2 Fundamental Data

The fundamental data contains the balance sheet, the cash flow statement, the income statement, ratio metrics and profit ratios for each stock. The specific fundamentals vary per stock, but in total the data set contains 411 different unique fundamentals. There is no fundamental that is consistently reported for each company. A list of all fundamentals can be found in Appendix A2. The fundamentals are reported quarterly or half-yearly depending on the company. The first quarter of the fundamental data is the 28/10/1989 and the last reported quarter is the 01/09/2019. Therefore, the fundamental data set is smaller than the price data set and the models can only be computed for the intersection of both sets. The fundamental data are used as features for the following prediction models. Chapter 3

Methodology

This chapter introduces the methods applied to the data set to forecast future stock performances and use those results to generate portfolios. This First, we introduce as benchmark and a general process of constructing portfolios. Second, we study performance indicators that we use as prediction targets for the prediction model. This is followed by several analysis of the fundamental data that we use as input features to train the prediction model. Last, we deﬁne our problem mathematically and specify the random forest model.

5 6 3.1. Portfolio and Benchmark Construction

3.1 Portfolio and Benchmark Construction

The goal of the thesis is to build portfolios that can outperform the S&P 500 index, or in our case rather the stock set which is based on the S&P 500 as of November 2019. Therefore, we need to find a subset of the stock set that performs better than the complete stock set overall. It is important to construct the portfolio and the benchmark index in a way that does not benefit nor disadvantages the portfolios against the benchmark. Thus, we use a simple equal-weight approach for both the portfolio and the index. In the first step, the stock set is split up in its subsectors according to the Industry Classification Benchmark and each stock is grouped with its peers according to its sector (Russel, 2019).

3.1.1 Benchmark Construction For the index, that we use as a benchmark to evaluate the performance of the portfolios, we use a buy and hold strategy. That means that we buy each stock with its speciﬁc weight in the beginning of the 30 year period and hold it throughout the entire period. Therefore, we do not have any reinvestments for the index. Figure 3.1 displays how the weights for each stock are calculated. First, for each sector all stocks in that sector are equal-weighted to calculate a sector index. Second, the sector indexes are again equal-weighted themselves to ﬁnd the weights for the complete index. Therefore, the weight W (S) of a stock S is calculated as follows:

1 W (S)= (3.1) nz where n is the number of sectors, which is constant with n = 34,andz is the number of stocks in the sector of stock S. z is called k and j in Figure 3.1

Figure 3.1: Index construction

The advantage of building the index in this way is, that neither a sector nor a stock is preferred based on its market-capitalization as in a market-capitalization weighting. This should prevent small-cap and sector biases. Chapter 3. Methodology 7

3.1.2 Portfolio Construction The portfolio is constructed as similar as possible to the index to avoid any biases. Instead of equal-weighting all stocks on a sector level, we only buy one stock per sector based on the forecasts of our model. To outperform the index, these stocks are the ones the model identiﬁes as the best stock in each sector. This means we have 34 stocks per portfolio. It would be possible to hold more stocks of each sector, but by only holding one stock per sector at a time, we can reduce diversiﬁcation and thus highlight the functionality of the underlying prediction model. These 34 stocks are equal-weighted in the beginning on a portfolio level similar to the index.

Figure 3.2: Portfolio construction

Since our predictions change over time, the stocks in each sector index can also change accordingly. If this happens, the stock that was hold for that sector at the time, is sold and a new stock enters the sector portfolio. Therefore, we have to incorporate reinvestments. The buy and hold strategy of the index does not allow any ﬂows from one sector to another sector, thus we will reinvest similarly for the portfolio. Therefore, all proﬁts that occur after a sale of a stock will be completely reinvested in a new stock from the same sector. This implies that over time the weights of the sectors might change in the portfolio and the index. The portfolio is re-balanced every time our prediction model indicates a change in future stock performances which usually happens whenever acompanyreportsnewfundamentaldata. All models exclude trading costs and other non-ideal conditions for both the portfolio and the benchmark index. This is a fair assumption as long as we keep the conditions equal for the portfolio and the index. The model should be considered a theoretical construct to test the prediction boundaries of modern stock markets rather than an actual trading strategy at this point, however it could be easily implemented as such. 83.2.PredictionMetrics

3.2 Prediction Metrics

Several performance indicators can be considered to estimate the attractiveness of a stock compared to its peers. The following chapter introduces three performance indicators that we use as prediction metrics for the experiments.

Log Return Difference With the goal to outperform a benchmark, the obvious choice is a metric that measures the outperformance of a stock against its sector index directly. Therefore, we define the first metric as the quarterly average of the daily logarithmic return differences of a stock versus its sector index for a given quarter: 1 T P S P I LR(QS)= log t+1 log t+1 , (3.2) n T P S P I t=1 t t X S I where S is a stock and I the stock’s sector index. Pt is the price of stock S at day t,andPt is the price of index I at day t respectively. We define QS as the nth-quarter of stock S, where QS t. n n 2

Information ratio A ratio often used to measure the performance of portfolio managers is the Information ratio, which sets a portfolio’s outperformance in relation to the tracking error between portfolio and benchmark (Kidd, 2011): R S R I S Qn Qn IR(Qn)= , (3.3) S Qn S I where S is a stock and I the stock’s sector index. Pt is the price of stock S at day t,andPt is the S S price of index I at day t respectively. We define Qn as the nth-quarter of stock S, where Qn t. S/I 2 r S I t R S Additionally, we define t as the daily return of or for day ,and Qn as the cumulative S R S Q return of the stock and Qn as the cumulative return of the index over the quarter n.Similarly, S Qn is the tracking error, defined as the quarterly variance of daily return differences between the portfolio and the index.

Sharpe ratio One of the most common performance indicators for fund performance is the Sharpe ratio, which was introduced by William F. Sharpe in 1966. The Sharpe ratio, which is a risk-adjusted performance indicator of a security or portfolio, can be used independently from a stock benchmark. The Sharpe ratio measures the outperformance of a security compared to the risk-free rate divided by the security’s volatility:

R S R S Qn F SR(Qn)= , (3.4) S Qn

S I where S is a stock and I the stock’s sector index. Pt is the price of stock S at day t,andPt S is the price of index I at day t respectively. We define Qn as the nth-quarter of stock S, where S S Q t r S t R S n .Additionally,wedefine t as the daily return of on day ,and Qn as the cumulative 2 S Q R S return over a quarter n,and F as the quarterly risk-free rate. Similarly, Qn is the volatility, defined as the quarterly variance of daily returns. Chapter 3. Methodology 9

Verification of performance indicators Before we can use these performance indicators in the prediction model, it should be verified that all of them actually indicate outperformance of stocks if predicted correctly. Therefore, we construct three portfolios using the different performance indicators and an index according to the process of Chapter 3.2. For each of the three portfolios (Log Return Difference, Information ratio, Sharpe ratio) we choose the best stock per sector as the stock with the highest respective performance indicator at each day. Figure 3.3 and Table 3.1 provide the results of this analysis:

Figure 3.3: Performances of the portfolios based on the Log Return Diﬀerence, the Information and Sharpe ratio as deﬁned above, assuming knowledge of the metrics in the future, compared to the index constructed for our stock set.

Portfolios Performance Indicator Log Return Diﬀerence Information ratio Sharpe ratio Index Alpha p.a. 326% 247% 205% Sharpe ratio 3.37 3.63 3.43 0.73 Information ratio 3.55 4.07 3.73

Table 3.1: Performance indicators for the three diﬀerent portfolios and the index. The portfolios clearly outperform the benchmark index, suggesting that the performance indicators are good prediction metrics.

As we can see in Figure 3.3 and Table 3.1, all three portfolios outperform the benchmark index significantly, proving that the performance indicators are good prediction metrics if predicted correctly. The Log Return Difference portfolio is clearly outperforming the benchmark with an annualised alpha versus benchmark of 326%. The Sharpe and Information ratios are also significantly higher than for the sector index. Looking at the extraordinary returns, it is obvious that this strategy could not actually be traded for liquidity reasons, even if one could perfectly predict the metrics, as assumed in this example. 10 3.2. Prediction Metrics

The Information ratio tries to assess how much of a portfolio’s outperformance is actually generated by the expertise of a portfolio manager rather than an increased risk taking, which is controlled for by the tracking error. Therefore, we would expect high risk-adjusted returns for the Information ratio portfolio. As we can see in Figure 3.3, if we build a portfolio choosing the stock with the highest Information ratio from each sector for each quarter, this portfolio is also resulting in a high alpha versus its benchmark with especially high Sharpe and Information ratios. Since the Sharpe ratio measures risk-adjusted returns against the risk-free rate, in theory, it is not obvious that this directly leads to an absolute outperformance against the stock benchmark. Instead, the ratio could deliver a low volatility portfolio with low returns, which could also result in a good Sharpe ratio. The figure suggests that in practice both the risk-adjusted and the absolute returns are preferable to the benchmark. The previous portfolios have shown that all prediction metrics are good indicators to identify outperforming stocks. The Log Return Difference metric seems to be the best indicator for alpha- generating stocks that also have good but lower risk-adjusted returns than the other metrics. This could have been expected because the Log Return Difference metric is the strongest indicator of outperformance. The Information ratio prediction metric provides the highest Information ratio of a portfolio, with an alpha smaller than the Log Return Difference but higher than the Sharpe ratio. This can be explained by the fact, that opposite to Sharpe ratio, the Information ratio incorporates the outperformance of a stock. The Sharpe ratio is the only metric which does not measure any outperformance versus the stock index, but rather measures absolute risk-adjusted returns. Therefore, it seems legitimate that the Sharpe ratio portfolio has the lowest alpha of the three metrics. Although one would expect it to have the highest Sharpe ratio, this cannot be verified by the simulation. On the contrary, the Sharpe ratio is lower than for the Information ratio portfolio. A possible reason for this discrepancy between expectation and result could be that the model uses the highest Sharpe ratio for each quarter to build the portfolio instead of optimizing the Sharpe ratio for the 30 year period Overall, all three performance metrics are good indicators for the outperformance of a stock against its benchmark and from here on will be used for the experiments of Chapter 4. Chapter 3. Methodology 11

3.3 Ensemble Properties of the Financial Ratios

After defining indicators for portfolio performances, it is also worth analyzing the performance drivers, namely the firms’ fundamental data, in detail. The following chapters contain noteworthy examples of those fundamentals to illustrate the format of the data and its relation to the firms’ performances. Unfortunately, we can neither display the plots for all fundamentals in the main part nor in the appendix of this thesis, since the fundamentals are amassing to thousands. Therefore, the following illustrations should rather be understood as filtered and selected samples.

3.3.1 Distributions The ﬁrst analysis is showing the distributions of fundamentals on a sector level (see Appendix B for all distributions). To generate the distributions, we ﬁnd intersecting fundamentals for a sector, meaning all fundamentals that each of the stocks reports in a given sector. For each of the intersecting fundamentals we collect all reported fundamental values of all stocks within a sector over the 30 year period and plot their distributions in the following histograms.

Earnings per Share

(a) Aerospace and Defense (b) Automobiles and Parts

Figure 3.4: Distributions of Earnings per Share for selected sectors. 12 3.3. Ensemble Properties of the Financial Ratios

The first set of histograms displays the distribution of the “basic-normalized-eps” which is a normalized measure of the earnings per share for four different sectors. Since the earnings per share have been normalized, they can be easily compared among different stocks and even sectors.

This speciﬁc fundamental shows similarities between diﬀerent sectors. Especially the “Aerospace and Defense” and the “Automobiles and Parts” distributions, which are both manufacturing industries, have a comparable range of earnings and both peak around the same value. The “Banks” sector has the longest tails of all four distributions, ranging from -6 to 6, whereas the “Construction and Materials” sector seems more defensive and has little negative earnings to report.

Current ratio

(a) Aerospace and Defense (b) Automobiles and Parts

Figure 3.5: Distributions of Current ratios for selected sectors.

The Current ratio is used to assess the liquidity and short-term credit worthiness of a ﬁrm. It is calculated by dividing current assets through current liabilities, which must result in a value larger than or equal to zero. A high ratio implies a high liquidity of a company.

Although the distributions follow a similar pattern, one can see that the Current ratios vary significantly for different sectors. Especially the “Automobiles and Parts” sector has on average good Current ratios, whereas the “Fixed Line Telecommunications” sector is generally associated with less liquidity. In fact most Current ratios of the “Fixed Line Telecommunications” sector are below 1, which indicates that firms have difficulties meeting their short-term obligations. Chapter 3. Methodology 13

Dividends

(a) Aerospace and Defense (b) Automobiles and Parts

Figure 3.6: Distributions of Dividends for selected sectors.

The plots above show the distributions of dividends that were paid to shareholders. Since dividends are negative cash-flows for the distributing firm, the fundamental values are negative by definition. We can see that in general most dividend payouts are small or zero. Nevertheless, the tails of the distributions are long. As we can see, the magnitude of the dividends is relatively consistent for the first three sectors. In general, dividends can vary with the size of the firms in a sector, as can be seen for the “Construction and Materials” sector. Therefore, the dividend fundamental differentiates itself from the earnings per share or the Current ratio because it is not normalized. 14 3.3. Ensemble Properties of the Financial Ratios

P/E ratio

(a) Aerospace and Defense (b) Automobiles and Parts

Figure 3.7: Distributions of Price/Earnings ratio for selected sectors.

A concluding example we want to study is that of the Price-of-earnings ratio. The P/E ratio is one of the most common fundamentals that investors analyze to evaluate stock prices. The P/E ratio measures at what multiple of its earnings a stock is valued and currently traded. Therefore, everything else equal, a lower P/E ratios is typically indicating an undervalued stock. Since earnings can potentially be negative or zero in the short run, P/E ratios can theoretically be negative or go to inﬁnity, but those scenarios, as in this case, are often avoided and thus the distributions start at zero. The P/E ratio distributions for the “Aerospace and Defense” sector and the “Fixed Line Telecom- munications” resemble each other. They both peak around a ratio of 20, but also have multiple outliers at the upper end. The “Banks” sector is characterized by a low peak in P/E ratios and also has lower maximum P/E ratios. The P/E ratios are consequently sector speciﬁc. Chapter 3. Methodology 15

3.3.2 Joint Distributions The distribution analysis has provided valuable insights into the characteristics of different sectors and fundamentals, but was not able to show a relation between fundamental data and a firm’s performance. This chapter studies joint distributions for sector fundamentals and corresponding stock performances during the quarter. We can reuse the fundamental data distributions from Chapter 3.3.1 and add a performance indicator. As the performance indicator for a stock’s fundamental value we use the price return, defined as the price increase measured in percent, of the respective stock over the quarter that followed the publication of the fundamental data. Then, we can plot the return of each stock in the sector for every quarter in the 30-year period over the reported fundamental data of the same quarter to see if specific values of fundamental data trigger a positive or a negative return. Figure 3.8 is an example of this process for the EV/EBITDA fundamental in the “Aerospace and Defense” sector and shows that there is a negative correlation between the multiple and stock performance.

Figure 3.8: Scatter plot with trend line of returns over the EV/EBITDA ratio values for the “Aerospace and Defense” sector.

In fact, we can see that there is a trend between the fundamental, in this case the Enterprise Value divided by the EBITDA, and the quarterly return of the stocks in the “Aerospace and Defense” sector. The regression coefficient is 0.6%, which means that for a one unit increase of the multiple, the return falls by 0.6% on average. Corporations that report high EV/EBITDA ratios, on average, have lower returns over the following quarter. This is intuitive, since firms with high valuations, expressed through a high EV/EBITDA, should tend to have lower performances than undervalued stocks. To study that behaviour in more detail and verify whether the trends we see are actually significant, we split up the plot above in 6 bins and study the relation between fundamental data and return for each of the bins individually. For each bin we run a linear regression through the data points and highlight the bins with significant trends. To check whether a trend is significant we conduct a statistical t-test for each bin. We set the null hypothesis as: H0 : The return is independent of the fundamental data. (3.5) To test the null hypothesis, we randomly shuffle the return values against its fundamental data, so that the return data of a quarter does not match its fundamental data anymore. Then we rerun the regression and note the slope of the regression. This process is repeated for 1000 times. We then compare the slope of the original regression with the randomized simulations’ slopes. After doing a two-sided t-test with a significance level of 5%, we reject the null hypothesis if the slope of the original regression is larger/lower than for the 50 highest/lowest random simulations. The statistics concerning the t-tests for all joint distribution can be found in Appendix 16 3.3. Ensemble Properties of the Financial Ratios

The bins that show a signiﬁcant trend according to that model are highlighted:

Enterprise Value/EBITDA

(a) Aerospace and Defense

(b) Food Producers

(d) Software and Computer Services

Figure 3.9: Joint distributions of selected sectors for the EV/EBITDA multiple. The graphs have been split to analyze behaviour for shorter intervals of the EV/EBITDA multiple. Bins with signiﬁcant trends according to a t-test have been highlighted with blue points and a red line.

The first joint distribution illustrates the relationship between the returns of several sectors and the “Enterprise Value / EBITDA” multiple. As we have already seen, the returns seem to negatively correlate with the ratio. This is especially true for the first bin. For small values of the multiple an increase leads to a more significant reduction in returns. This could be because we use a linear x-axis for the fundamentals values and thus relative changes of the ratio are higher for small values. Chapter 3. Methodology 17

P/E ratio

(a) Financial Services

(b) Food Producers

(d) Personal Goods

Figure 3.10: Joint distributions of selected sectors for the Price/Earnings ratio. The graphs have been split to analyze behaviour for shorter intervals of the P/E ratio. Bins with signiﬁcant trends according to a t-test have been highlighted with blue points and a red line.

The second example displays the influence of the P/E ratio on the return. The P/E ratio similarly to the EV/EBITDA multiple sets the value of a stock in relation to its profits. Therefore, it is not surprising that the behaviour is comparable. Overall, a rising P/E ratio is associated with lower returns, especially in the first bins. The “Food Producer” sector shows strong dependencies. 18 3.3. Ensemble Properties of the Financial Ratios

3.3.3 Canonical Averaging of the Volatility and Return Time Series conditional on Financial Ratios

To further sharpen the picture, we can add a time axis to the previous representations. Instead of plotting quarterly returns over fundamentals we can plot the cumulative return over several days in the beginning of the quarters over the fundamentals. The cumulative return for a given quarter starts at zero for the day of the fundamental publication and is then computed for the following 40 days. The cumulative return is deﬁned as:

t RC (t)= r(t), (3.6) n=0 Y where r(t) is the daily return of day t and r(0) = 1. The x-axis displays the sorted fundamental values, the y-axis shows the days for which the return is calculated, and the coloring indicates the magnitude of the cumulative returns, where the colors range from blue for low returns to red for high returns. To make the heat maps easier to understand, the scattered data has been binned and interpolated. Additionally, we can create similar plots for the volatility of the stock returns by replacing the cumulative returns with moving volatilities. The moving volatilities for a given day are defined as the standard deviation of the daily returns of a range of two days before and two days after that day: 2(t) = Var(r(t N),..,r(t),..,r(t + N)) (3.7) with N =2. Similarly, the x-axis displays the sorted fundamental values, the y-axis shows the days for which the moving volatility is calculated, and the coloring indicates the magnitude of the volatility, where the colors range from blue for low volatility to red for high volatility. The advantage of the heat maps is that we are able to study the return time series during the quarter right after new fundamental data was reported. The goal of that study is to identify specific trends and performance behaviours, that are triggered by specific fundamental values, and could be useful insights to be incorporated into the portfolios. Chapter 3. Methodology 19

Dividend yield

Figure 3.11: Heat maps displaying the return and volatility behaviour for the ﬁrst 40 days after reporting seasons, in addition to the reported dividend yields.

The heatmaps in Figure 3.11 display the cumulative return and the volatility for dividend yields. One can see that there is a strong dependence between the cumulative return and the volatility of the stocks, which is not surprising with respect to our deﬁnition of the short-term volatility. Interestingly, we see that high cumulative returns often come in cycles in which high return days are followed by lower return days. Therefore, we can see that even overall well performing quarters do not accumulate their return constantly, but rather gain and loose it in the beginning of the quarter until it eventually reaches an equilibrium. The “Financial Services” sector is a good example for this phenomenon. Ultimately, we do see that there are particular dividend values for which each sector performs well, those values are not necessarily the highest dividend yields. 20 3.3. Ensemble Properties of the Financial Ratios

Revenue/Total Assets

Figure 3.12: Heat maps displaying the return and volatility behaviour for the ﬁrst 40 days after reporting seasons, in addition to the reported Revenue/Asset multiples.

The implications of the “revenue-tot-assets” plots are in line with expectations: High revenue ratios indicate good future performances. Similar to the example above, we see a cyclical behaviour for the cumulative return over time. The days on which the local maxima and minima occur are even consistent among different sectors. Again, we see that the cumulative return behaves in cycles and markets need time to find their equilibrium price. This fact seems to contradict the market efficiency hypothesis, which states that markets quickly find an equilibrium after new information has been made available. It should be noted, that since we do not know which other factors have driven the performance of the stocks in that time, and the cycles might have been caused by influences that do not occur in the fundamentals, there could be plenty of explanations that justify that behaviour even under the efficient market hypothesis. Chapter 3. Methodology 21

3.4 Copula Analysis

3.4.1 Introduction At last, we use a Copula analysis to study the relationships between different fundamentals, and fundamentals and future returns. Copulas can be used to model “the dependence structures between stochastic variables” (Papaioan- nou et al., 2016) and have gained popularity in the financial industry for risk management and asset pricing. Especially, the Gaussian copula was widely used for credit derivative pricing before the Great Financial crisis (Li, 2000) and has been proven to be a valid model for stock return dependencies by Sornette et al. (2003). We do not model the return dependencies between different assets but between fundamentals and assets, as well as fundamentals and fundamentals, and it is not clear yet, which Copula is the best fit for the data set, as there are potentially several copulas fitting the data. In general, Copulas can be defined as multivariate distribution functions with uniform one-dimensional margin distributions (Nelsen, 1999). This analysis focuses on bivariate copulas only, which are defined as follows. If

FXY (x, y)=C (FX (x),FY (y)) , (3.8) where X, Y are random variables, FX (x),FY (y) are the marginal distribution functions and FXY (x, y) is the joint distribution function of X, Y ,thencopulaisthefunctionC that satisﬁes C(t, 0) = C(0,t)=0and C(t, 1) = C(1,t)=t for any t [0, 1] (Papaioannou et al., 2016). 2

Gaussian Copula

For the Gaussian Copula, this can be rewritten as

1 1 C(u, v; %):=2 (u), (v); % , (3.9) where is the distribution function of the standard normal distribution and 2 is the distribution function of the bivariate standard normal distribution with correlation ⇢.AGaussiancopulais completely deﬁned by its correlation matrix ⇢ and is not tail-dependent (Sornette et al., 2003; Meyer, 2013). 22 3.4. Copula Analysis

Student Copula

Another copula that we will ﬁt to the data is the Student copula, which can be deﬁned for bivariate distributions as

1 1 C(u, v; ✓)=G (u), (v); ✓ (v+2) 1 1 tv (u) tv (v) dydx x2 2✓xy + y2 2 (3.10) = 1+ , 2⇡ (1 ✓2)1/2 v (1 ✓2) Z1 Z1 ⇢ where ✓ is the linear correlation coeﬃcient and v is the degree of freedom of the t-distribution, the CDF is deﬁned as tv(.) (Salleh et al., 2016).

Important measures to identify the dependence between two distributions are correlation coefficients. We define the most common correlation coefficients associated with copulas in the following sections.

Pearson’s Correlation Coeﬃcient

Pearson’s coeﬃcient is the most common correlation coeﬃcient and is calculated by

Cov(X, Y ) E[XY ] E[X]E[Y ] ⇢X,Y = = , (3.11) Var(X) Var(Y ) E[X2] [E[X]]2 E[Y 2] [E[Y ]]2 where Cov(.), Var(.) andp E[.] denotep the covariance,p the variancep and the expectation respectively. X, Y are a pair of random variables. The coeﬃcient is however, a linear correlation coeﬃcient, “since linear correlation is not a copula- based measure of dependence, it can often be quite misleading and should not be taken as the canonical dependence measure.” (Embrechts et al., 2001)

Kendall’s tau

Kendall’s tau is a correlation metric based on concordance and discordance. We use the deﬁntion of concordance and discordance from the “Encyclopedia of Mathematic” (Nelsen, 2001). If (x ,y ) and (x ,y ) are two elements of a sample (x ,y ) n from a bivariate population, one j j k k { i i }i=1 says that (xj,yj) and (xk,yk) are concordant if xj xk and yj >yk and discordant if xj yk or if xj >xk and yj

Then tau can be deﬁned as:

( number of concordant pairs ) ( number of discordant pairs ) ⌧ = , (3.12) n 2 ✓ ◆ . where is the binominal coeﬃcient. . ✓ ◆ Chapter 3. Methodology 23

Spearman’s rho

Spearman’s rho is the Pearson coeﬃcient of rank variables and is deﬁned as

Cov(rg(X),rg(Y )) ⇢S = , (3.13) Var(rg(X)) Var(rg(Y )) where rg(.) is the rank of a variable andp X, Y are randomp variables. Kendall’s tau and Spearman’s rho are preferable correlation coeﬃcients for the copula analysis and will be calculated alongside the linear Pearson’s correlation for the following copulas (Embrechts et al., 2001).

Kolmogorov-Smirnov distance

The Kolmogorov-Smirnov distance can be used to measure the ﬁt between two distributions. The distance describes the deviation between an empirical distribution of actual observations and a hypothesized distribution (Kole et al., 2007). In our case, the empirical distribution is the cumulative distribution function of our observations and the hypothesized distribution is the function of the ﬁtted copula. The distance can be calculated as:

m DKS = max FE (xt) FH (xt) (3.14) t | | and the average as:

Da = F (x) F (x) dF (x), (3.15) KS | E H | H Zx where FE is the empirical distribution function and FH is the hypothesized distribution.

Anderson-Darling distance

Another distance measure to estimate ﬁts between distributions is the Anderson-Darling distance (Anderson and Darling, 1952). The Anderson-Darling distance gives more weights to the tails of the distribution than the Kolmogorov-Smirnov distance. The distance is deﬁned as:

m FE (xt) FH (xt) DAD = max | | (3.16) t F (x )(1 F (x )) H t H t p and its average as:

a FE(x) FH(x) DAD = | | dFH(x) (3.17) x F (x)(1 F (x)) Z H H p where again FE is the empirical distribution function and FH is the hypothesized distribution. 24 3.4. Copula Analysis

3.4.2 Calibration For the practical implementation of the copula analysis we use the python copulae package, which allows to fit data to several copulas including the Gaussian and the Student copulas, and also provides additional statistics including a goodness-of-fit test, which we use to filter the copulas. Since we have several features per sector whose dependencies can be studied both pairwise and in relation to the sector returns, the number of copulas we can potentially create reaches thousands. Therefore, we can neither display all the copulas in the thesis nor in the appendix and will only provide a selection of the best copulas in terms of fit and correlation. We focus on the Gaussian copula that has been proven to be a good choice for the study of financial assets (Sornette et al., 2003). Nevertheless, since we have such a large data set, there are many features that fit several copulas. The following chapter provides a selection of informative copulas. For each copula we provide Pearson’s correlation coefficient ⇢, which completely defines Gaussian copulas, Kendall’s tau, Spearman’s rho and the p value of the goodness-of-fit test. The first copulas display the relationships between fundamentals and the second part studies the influence of fundamentals on the future returns of a given sector. Therefore, we again match the fundamentals and returns by quarters as in Chapter 3.3.2 for the joint distributions.

Gaussian Copulas between Fundamentals

(a) Financial Services (b) Industrial Engineering

Figure 3.13: Gaussian copulas for reinvestment rate and return-on-assets/return-on-equity. Addi- tionally, the Pearson’s, Kendall’s and Spearman’s correlation coefficients are displayed. The null hypothesis of the goodness-of-fit test is that the copulas are fitting. Chapter 3. Methodology 25

(a) Household Goods and Home Construction (b) Industrial Engineering

Figure 3.14: Gaussian copulas for reinvestment rate and EBITDA. Additionally, the Pearson’s, Kendall’s and Spearman’s correlation coefficient are displayed. The null hypothesis of the goodness- of-fit test is that the copulas are fitting.

For the copulas between fundamentals, we can find several with high correlations and good-fits. It is to be noted though, that many fundamentals, especially financial ratios, can be converted into each other and are therefore not just highly correlated as the metrics would imply, but rather identical. Since those fundamentals do not contain any additional information and the observed correlations are trivial, we checked all fundamental combinations and avoided copulas that measure identical fundamentals. The copulas above display the dependencies between the reinvestment rate of firms within a sector and profitability ratios of those firms, like EBITDA multiples and Return-on-Equity/Assets. We find that in general, the reinvestment rate correlates with the profitability of the firms, especially the copulas for the ROE/ROA show strong dependencies. This observation is not surprising, because reinvestments are usually financed from firms’ profits and the reinvestment rate can therefore be increased for higher profits. Unfortunately, the copulas cannot tell us about the underlying causalities. We do not know if the reinvestments are completely dependent on the profits, or if we can also conclude that firms with high reinvestment rates tend to be more profitable in the long run. 26 3.4. Copula Analysis

Positive Correlations between Fundamentals and Returns

(a) Financial Services: Dividend yield (b) Fixed Line Telecomm: Reinvestment rate

(e) Oil and Gas Producers: Reinvestment rate (f) Personal Goods: Sales receivables

(g) Software & Computer Services: EBIT- (h) Support Services: Reinvestment rate DA/Assets

Figure 3.15: Gaussian copulas for fundamentals and positive returns. Additionally, the Pear- son’s, Kendall’s and Spearman’s correlation coefficients are displayed. The null hypothesis of the goodness-of-fit test is that the copulas are fitting. Chapter 3. Methodology 27

Figure 3.15 shows copulas with positive correlations between fundamentals and stock returns. In general, the correlations between a single fundamental and the return of a stock are small, which is not astonishing, because stock returns are driven by a large amount of factors, of which fundamentals are just a small part. Nevertheless, there are several fundamentals that possess positive correlation with returns like dividends, reinvestment rates, EBITDA ratios and sales receivables. Encouragingly, the correlations between the fundamentals and the return that we see above are in line with expectations:

• High dividends attract investors; • Firms that have high reinvestment rates are on a growth track; • High EBITDA ratios highlight the profitability of a firm; • Sales receivables indicate a high demand for a firm’s goods and services and thus future income.

All of the points above should lead to higher returns, as the copulas indicate Graham and Dodd (1934).

Negative Correlations between Fundamentals and Returns

We do the same analysis for fundamentals that negatively correlate with future stock returns. The correlations between fundamentals and negative returns are higher in magnitude than for the experiment above with positive correlations. The highest positive correlation found is 0.17, whereas there are seven negative fundamental-return pairs with correlations above that level. The relations are especially strong for P/E, EV/EBITDA and P/B ratios, as one can see in Figure 3.16 below. These correlations are also in line with expectations, since higher price multiples tend to indicate an overvaluation of stocks which should result in lower returns. This seems to be especially true for the “Construction and Materials” sector. 28 3.4. Copula Analysis

(a) Construction and Materials: EV/EBITDA (b) Construction and Materials: P/E ratio

(e) Life Insurance: P/B ratio (f) Nonlife Insurance: P/E ratio

(g) Personal Goods: EV/Revenue (h) Software and Computer Services: EV/Revenue

Figure 3.16: Gaussian copulas for fundamentals and negative returns. Additionally, the Pear- son’s, Kendall’s and Spearman’s correlation coefficients are displayed. The null hypothesis of the goodness-of-fit test is that the copulas are fitting. Chapter 3. Methodology 29

3.5 Random Forest Model 3.5.1 Mathematical Formulation Before we can start using a random forest model for our experiments, we first need to provide a mathematical formulation for our problem: Let S be a stock with daily prices P (t), where t are all the days in the 30-year period and thus t 2 [1, 7560].Wedefineaquarterq as a set of consecutive days t,andF (q) as the set of fundamentals of stock S reported just before quarter q. Therefore, all days t q follow the fundamental reporting. 2 Further, we need to define a prediction target for the random forest predictor, which is usually a metric based on the prices of a stock as we saw in Chapter 3.2 and is calculated per quarter. Therefore, we define our quarterly prediction target (q) as a function of P (t).

(q)=G(P (t)) (3.18) where G(.) are the Log Return Difference, Information ratio and Sharpe ratio metrics defined in Chapter 3.2, but could potentially be any function. To use the random forest model to forecast future stock performance, we need a testing and a training set, which are based on different time periods. Therefore, we define a set of training quarters Qtrain and a set of testing quarters Qtest:

Qtrain =[q1,...,qn] (3.19) and

Qtest =[qn+1,...,qN ] (3.20)

Then we can deﬁne a training set , which includes the fundamental data F (Qtrain) of the training quarters as features and the prediction metric (Qtrain) as targets of the training quarters.

=[F (Qtrain),(Qtrain)] = [[F (q1),...,F(qn)], [(q1),...,(qn)]] (3.21)

Similarly we deﬁne the testing set ⌦, which includes the fundamental data of the testing quarter F (Qtest) as features. The target set is empty, since this will be predicted by the random forest:

⌦=[F (Qtest)] = [[F (qn+1),...,F(qN )]] (3.22)

In practice, we initiate a random forest model per stock individually and feed it the set for training followed by the testing set ⌦ to forecast future stock performances:

training testing RF ⌦. (3.23) ! !

This process can be repeated for each stock in the data set and diﬀerent sets of training and testing quarters. 30 3.5. Random Forest Model

3.5.2 Random Forest Regressor

Deﬁnition

A simple but effective machine learning technique is the random forest, which can be used for regression and classification purposes. Random forests are an averaging ensemble method or, to be more precise, bagging-estimators as originally developed by Breiman, who defines a random forest classifier as follows: “A random forest is a classifier consisting of a collection of tree-structured classifiers h (x, ⇥ ) ,k =1,... { k } where the [⇥ are independent identically distributed random vectors and each tree casts a unit k} vote for the most popular class at input x.” (Breiman, 2001) This can be adapted to a random forest regressor by averaging the probabilistic predictions of the trees instead of using unit votes.

The random forest algorithm can be split up in three steps, according to Liaw (2002):

1. Draw ntree bootstrap samples from the data. 2. For each of the bootstrap samples, grow an unpruned classiﬁcation or regression tree, with the following modiﬁcation: at each node, rather than choosing the best split among all predictors, randomly sample mtry of the predictors and choose the best split from among those variables.

3. Predict new data by aggregating the predictions of the ntree trees by majority for classiﬁers and average for regression.

An advantage of random forest models is that they do not tend to overfit, independent from the number of trees simulated. Since we are not only interested in classifying the stock set, but interested in predicting continuous performance indicators, we use a random forest regressor in all experiments besides the first one which is based on a random forest classifier. The results of a random forest regressor are more informative since the forecasts can take any value instead of just classifying the returns in predefined categories. Therefore, the results of the random forest regressor can be used to differentiate between predicted stock performances in more detail.

Implementation

The practical implementation is done with the python package scikit-learn (Pedregosa et al., 2011). The package deviates from Breiman’s original proposition to have each tree cast a unit vote for a random forest classifier and instead averages the probabilitic predictions of the trees for the random forest classifier and regressor. We implement the random forest regressor with 1000 estimators and bootstrap sampling, where the split quality is measured through the mean squared error. An example for a trained random forest tree can be found on the next page. The random forest tree is based on a small subset of the data for illustration purposes only. An actual tree for the complete data set has significantly more branches and is not suited for display. Chapter 3. Methodology 31 [NYSE: MSFT] Random forest decision tree for the stock of the Microsoft Corporation Figure 3.17: Exampleequation of that a is the random result forest of that the can previous training. be Following used the to tree predict step-by-step one a eventually new reaches a target prediction based value. on new input features. Each cell contains classification 32 3.5. Random Forest Model

3.5.3 Training and Testing Sets The random forest model uses the fundamental data of a specific stock as a feature to predict a performance metric of the respective stock over a given quarter as a target. The data set, including fundamental and price data, is therefore split into a training set S and a testing set ⌦S as defined in Chapter 3.5.1. The random forest model is fitted on the training set and predictions are only done onto the testing set to avoid biases. The data split can be done according to two different approaches: 1. Split up the data set in a fixed training and testing set, e.g. using the first 70% of all quarters as a training and the last 30% as a testing set.

Figure 3.18: Fixed training set, which is created by simply dividing the fundamental and return data for each stock at a given interval.

2. Create a dynamic training and testing set, e.g. each testing quarter is trained on the last 20 foregone quarters, therefore also avoiding any biases. The latter method obviously needs more computation time, since more training and predictions have to be done and each quarter is predicted by its own random forest. Nonetheless, it appears to be the more intuitive way for practical application.

Figure 3.19: Dynamic training approach in which we only predict a single quarter at a time using the last N quarters. This method would be used if the algorithm was to be practically implicated.

Both approaches will be studied in detail during the following experiments. Chapter 4

Results

4.1 Classiﬁcation Experiment

Before we start building actual portfolios based on the random forest predictions, we will start with two simple classification experiments. The first one tries to predict whether a stock will have a positive or negative return over a given quarter, thus generating profits or losses. The second tries to predict whether a stock will outperform or underperform its sector index over a given quarter. Both models use the complete fundamental data available for each stock as input features and try to predict a simple target of 1 or -1 for positive/negative performance or out-/underperformance respectively.

4.1.1 Proﬁt and Loss Prediction

To identify negative and positive quarters for the training set, we calculate the cumulative return of a stock for each quarter omitting the value and just keeping the sign of the return. Then the prediction targets can only take two values: -1 and 1. We split each stock’s data in half and use 50% as the training set and 50% as the testing set to evaluate the performance of the random forest classifier. Then we train the random forest on the whole training set and feed it the fundamentals from the whole testing set to predict the profits and losses for all testing quarters simultaneously. Afterwards, we check how often the random forest model was able to predict the sign of the return correctly. This process is repeated for all 222 stocks, which increases the total testing quarters to 23325, of which 11606 quarters had positive returns and 11719 quarters had negative returns. In 11823 cases the random forest classifier predicted the performance correctly, which is equal to 50.69% of the quarters. This means the random forest classifier predicts more often correctly than incorrectly, although only by 0.69%. To check if this is significant, we perform a one-sided t-test for different significance levels. The null-hypothesis is that the random forest algorithm just randomly decides if a quarter has a positive or negative return. Thus the null-hypothesis can be written as:

H0 : Prob(r>0) = Prob(r<0) = 0.5, (4.1) ↵ = [5%, 1%, 0.1%], N = 1000

33 34 4.1. Classiﬁcation Experiment

To perform the test, we replace the random forest with a random generator, which chooses a value -1 or 1 for each quarter, and find the number of correct guesses. We repeat this process 1000 times to generate a distribution and determine the critical values for different significance levels ↵.We define the critical value c as the value for which Prob(x>= c)=↵ The critical values for the 5%, 1%, 0.1% p-values are 50.52%, 50.74% and 51.03% respectively. This means we can reject the null-hypothesis for a significance level of 5% but not for the other significance levels. The following table summarizes the results of the t-test:

Total number of quarters: 23325 Number of quarters with positive return: 11606 Number of quarters with negative return: 11719 Total Correct guesses: 11823 Correct guesses: 50.69% Critical value for ↵ =5%: 50.52% Critical value for ↵ =1%: 50.74% Critical value for ↵ =0.1%: 51.03%

4.1.2 Outperformance and Underperformance Prediction Asimilarexperimentcanbedoneforaclassifierthatdoesnotonlypredictthesignofastock’s performance for each quarter but predicts if a stock is going to outperform or underperform its sector index. Therefore, we can use the Log Return Difference as defined in Chapter 3.2.1, which measures the outperformance of a stock against its index. Again, we omit the actual value and only keep the sign. Then the prediction targets can again only take two values: -1 and 1. Besides the change of prediction targets, the process is identical to the one above. The results for the t-test are similar, but slightly different, since the underlying targets have changed. We find that the random forest classifier predicted 50.51% of the predictions correctly, which is not enough to reject the null-hypothesis for any significance level.

Total number of quarters: 23325 Number of quarters with positive return: 14008 Number of quarters with negative return: 9317 Total Correct guesses: 11823 Correct guesses: 50.51% Critical value for ↵ =5%: 50.55% Critical value for ↵ =1%: 50.77% Critical value for ↵ =0.1%: 51.00% Chapter 4. Results 35

4.2 Basic Experiment

According to the methods presented in Chapter 3.3, ﬁrst basic experiment were conducted for each prediction metric from Chapter 3.2.

4.2.1 Fixed Training Set In the first version, the model is trained and tested with a fixed training and testing set. The random forest is trained on the complete training set and predicts the complete testing set at once. This and all following portfolios are constructed using the portfolio construction process described in Chapter 3. Since each stock has a different reporting frequency, which also might have varied during the 30- year periods, each stock has a different data set. On average, each stock has a total of 200 quarters, which implies that data actually is not just reported quarterly but often comes in irregularly within the quarters at multiple occasions since there are several sheets to be reported. This also explains the high number of quarters in the classification experiment in the previous chapter. The following simulations used 70% of the total quarters as the training set and the last 30% of the quarters as the testing set, which equals to roughly 140 and 60 quarters respectively and a total time horizon of 30 years. The time axis only displays the prediction of the testing set and is therefore shorter than in the metric tests. The return axis is displayed in log-scale. The alpha, Sharpe ratio and Information ratio are always calculated for the testing set only. These 70/30 ratios will be considered standard for this basic experiment. The different data splits will be studied and analysed further on.

Log Return Diﬀerence

Figure 4.1: Performance of a portfolio using the Log Return Diﬀerence metric and a ﬁxed training set of 70% in comparison to the index.

This Log Return Diﬀerence portfolio clearly shows that for most of the times the random forest model could not generate a portfolio better than its benchmark. This is highlighted by a negative annualized alpha and, compared to the benchmark, lower Sharpe and Information ratios during the testing period. There is a longer period of better returns between days 5500 and 6300. In general, the volatility of the portfolio seems to be slightly higher, which might result from the fact that the metric is not risk-adjusted and is not controlling for volatility. 36 4.2. Basic Experiment

Information Ratio

Figure 4.2: Performance of a portfolio using the Information ratio metric and a ﬁxed training set of 70% in comparison to the index.

The Information ratio model is the best performing model with an alpha of -0.52% and a Sharpe ratio close to that of the benchmark. The resulting Information ratio is negative, which means that the random forest model was not able to build a portfolio that fulﬁlls its target of generating an outperformance in terms of a better Information ratio. Nevertheless, the Information ratio is the highest of the three simulations and the risk-adjusted returns are better than for the Log Return Diﬀerence metric.

Sharpe Ratio

Figure 4.3: Performance of a portfolio using the Sharpe ratio metric and a ﬁxed training set of 70% in comparison to the index. Chapter 4. Results 37

Similar to the previous models, the Sharpe ratio experiment was not successful, but in fact is the second worst model in terms of alpha. The portfolio seems more defensive and the overall lower volatility results in the best Sharpe ratio of the three, close to the Sharpe ratio of the benchmark. Although the random forest predictions have so far not been resulting in an outperforming portfolio, the models generated portfolios which are behaving as the prediction metric would imply, namely the Log Return Diﬀerence model has the largest deviations from the benchmark and the Sharpe and Information ratio models have the largest Sharpe and Information ratios. This leads to the conclusion that the random forest predictions are not completely random. 38 4.2. Basic Experiment

4.2.2 Dynamic Training Set The second prediction method is the dynamic training as described in Chapter 3.3.3, in which the model predicts each quarter by training the random forest on the 20 prior quarters’ fundamental data. The expected benefit of this method lies in using the most current fundamental data for each quarter prediction instead of predicting 60 quarter periods at once as in Chapter 4.2.1, because the data might have significantly changed in the meantime and hence the fundamental data of the first quarter could be a bad indicator for the last quarter of the 30 year period. The total prediction period increases, since we can predict all quarters except the first 20 quarters that we need as a minimum training period, instead of only the last 30% of all quarters.

Log Return Diﬀerence

Figure 4.4: Performance of a portfolio using the Log Return Diﬀerence metric and a dynamic training set of 20 quarters in comparison to the index.

The dynamic prediction for the Log Return Difference delivers better results than the fixed training. The alpha has increased from -2% to -0.5% per year, leading to an increase in the Information ratio. The Sharpe ratio has decreased over the longer periods for both the portfolio and the benchmark, where the portfolio still has a lower Sharpe ratio than the benchmark. The volatility of the portfolio has also increased, becoming 11.5% annually compared to the index volatility of 10.6% annually. This increased volatility can be observed in the drawdown ranging from day 2000 to day 3500, where the portfolio looses significantly against the benchmark, and the drawup that follows from day 3500 to day 4800, where it catches up again. This supports the hypothesis that the Log Return Difference metric leads to increased risk and volatility. However, this does not lead to higher returns so far. Chapter 4. Results 39

Information Ratio

Figure 4.5: Performance of a portfolio using the Information ratio metric and a dynamic training set of 20 quarters in comparison to the index.

For the Information ratio the results are dramatically worse. Although the ﬁrst 1000 days look promising, the model is not able to keep up with the benchmark and looses -1.8% annually against it. Especially the Information ratio, which should have been optimized, is very low. The volatility of portfolio and benchmark are equal at 10.6% annually, the volatility for the portfolio deviates more to the downside though. It is not clear why this model has moved from best to worst - it could be that the 20 quarters training are too short for this model. The next chapter illustrates whether and how the results change with longer training periods. 40 4.2. Basic Experiment

Sharpe Ratio

Figure 4.6: Performance of a portfolio using the Sharpe ratio metric and a dynamic training set of 20 quarters in comparison to the index.

The Sharpe ratio model has also improved compared to the ﬁxed training model. The annualized relative performance has improved slightly from -1.4% to -1%. The Sharpe ratio has relatively lost compared to the benchmark’s. The Sharpe ratio here is still the highest of the three models. The dynamic prediction improved results for the Log Return Diﬀerence and the Sharpe ratio model and worsened results for the Information ratio portfolio. Therefore, the behavior of the portfolios is to be studied for various durations of training periods to gain further insights. Chapter 4. Results 41

4.3 Sensitivity Analysis

After the ﬁrst experiments were not successful, we study the sensitivity of the models with respect to the training periods of the random forest model.

4.3.1 Fixed Training Experiment The first sensitivity analysis is applied to the fixed training set. The training set size is varied in 10% steps from 30% to 90% of the total number of quarters. The portfolio performance is, as always, displayed for the testing set only. The results for the Log Return Difference ratio are shown in the following figure:

Log Return Diﬀerence

(a) period = 30% (b) period = 40%

(e) period = 70% (f) period = 80%

(g) period = 90%

Figure 4.7: Sensitivity analysis of portfolios using the Log Return Diﬀerence metric and ﬁxed training sets varying from 30% to 90%. 42 4.3. Sensitivity Analysis

Surprisingly, the results have improved in all of the cases, and our first experiment with a 70% training period appears to be the worst of the simulations. Even for the shorter training periods of 30% - 60% all of the simulations have generated outperformance among all three performance indicators. In addition to the 70% model, the 80% model also underperformed the benchmark. The 90% model was able to generate significant outperformance again. This implies that the random forest model either way suffered from the extension of the training set or that the predictions lost in precision during the 70% and 80% periods.

Information Ratio

(a) period = 30% (b) period = 40%

(e) period = 70% (f) period = 80%

(g) period = 90%

Figure 4.8: Sensitivity analysis of portfolios using the Information ratio metric and ﬁxed training sets varying from 30% to 90%.

The Information ratio simulations also improved significantly. With the exception of the 30% model, where the Information ratio underperforms slightly, the portfolios are comparable to the Log Return Difference portfolio above. Again, we see that the 70% and 80% portfolios are un- Chapter 4. Results 43 derperforming, with a recovery in the 90% model. The range from 40% - 60% is outperforming strongly, more than the Log Return Difference model. The Information ratios are generally high.

Sharpe Ratio

(a) period = 30% (b) period = 40%

(e) period = 70% (f) period = 80%

(g) period = 90%

Figure 4.9: Sensitivity analysis of portfolios using the Sharpe ratio metric and ﬁxed training sets varying from 30% to 90%.

The Sharpe ratio model ﬁts into the picture of the ﬁrst two metrics. All simulations have improved compared to the 70% training set model, though the 80% model is also underperforming. The Sharpe ratio model generates portfolios that even in the underperforming simulations provide strong Sharpe ratios that are close to the benchmark. 44 4.3. Sensitivity Analysis

4.3.2 Table Form of Fixed Training Experiment The table below provides the three performance indicators used to evaluate the portfolios (Alpha, Sharpe ratio, Information ratio) and the volatilities for each portfolio and training set in comparison with the Sharpe ratio, the volatility and the return of the benchmark. The outperforming metrics are coloured green, the underperforming red and the ratios close to the benchmark yellow.

The table highlights what was already visible in the plots of the portfolios. Apart from the 30% model for the Information ratio, all portfolios outperform their benchmarks for all training sets except for the 70% and 80% simulations. It also provides some general insights into the different portfolios: The Log Return difference portfolio seems to be the most aggressive one, having the largest deviations from the benchmark and the risk-adjusted portfolios. In fact, it has the highest magnitude of alpha both positive and negative for 5 out of 7 simulations. Nevertheless, the volatility is lower than the benchmark’s volatility in 5 of 7 simulations. The contrary can be said about the Sharpe ratio portfolio which has a lower volatility than or equal to the benchmark for all training sets. It also provides the best Sharpe ratios in 5 out of 7 simulations. The Information ratio is the most volatile of the portfolios and deviates from the benchmark more than the first basic experiment would have suggested. There are some simulations in which the Information ratio model provides strong risk-adjusted returns, even more so than the Sharpe model. Interestingly, different portfolios seem to follow the same under- and outperforming patterns. All of the three portfolios were underperforming for the 70% and 80% simulations, which suggests that the predictions perform badly for these training sets independent from the prediction target. What catches the eye is that the volatility of the index is lower while the returns are higher for the 70% and 80% simulations. According to most financial theories, lower volatility would normally be associated with lower returns. Therefore, those periods are subject to the low-volatility anomaly which contradicts the commonly used Capital Asset Pricing Model (Sharpe, 1964; Mossin, 1966; Lintner, 1975), but has been repeatedly observed in practice (Baker et al., 2011; Blitz et al., 2019). It should be noted that this 70-percentile of the quarters starts in the middle of 2009, which is the period after the Global Financial Crisis, and might indicate a change in the market regime and explain the higher returns as the markets rebounded. The portfolios’ alphas seem to be related to that volatility/return behaviour of the benchmark. In times of low volatility the portfolios fail to outperform the benchmark. At the same time, the benchmark is delivering high returns, which are obviously more difficult to beat. This explains why the prediction models are performing badly in the 70% and 80% periods. The volatility also stays low for the 90% portfolio, but since returns decline, all three portfolios can outperform the benchmark again. This would also suggest that the performance of the random forest model is not as sensitive to the training set but rather to the testing set and the market environment (volatility, return) where it is applied to. Chapter 4. Results 45

Set Indicator Log Return Sharpe Ratio Inform. Ratio Benchmark

Alpha/Return p.a. 1.00% 0.23% -0.11% 11.14% Sharpe Ratio 0.54 0.54 0.51 0.53 30% Information Ratio 0.12 0.03 -0.02 Volatility p.a. 21.30% 19.88% 20.45% 19.85%

Alpha/Return p.a. 0.59% 2.43% 2.22% 10.58% Sharpe Ratio 0.55 0.64 0.60 0.52 40% Information Ratio 0.10 0.29 0.39 Volatility p.a. 19.31% 19.25% 20.02% 19.46%

Alpha/Return p.a. 2.06% 1.82% 1.34% 11.50% Sharpe Ratio 0.63 0.67 0.64 0.57 50% Information Ratio 0.35 0.25 0.18 Volatility p.a. 18.87% 18.64% 18.99% 18.98%

Alpha/Return p.a. 0.77% 2.56% 3.30% 10.39% Sharpe Ratio 0.53 0.61 0.64 0.49 60% Information Ratio 0.12 0.44 0.50 Volatility p.a. 20.02% 19.95% 19.98% 20.12%

Alpha/Return p.a. -1.69% -1.19% -0.45% 17.45% Sharpe Ratio 0.96 1.01 0.99 1.02 70% Information Ratio -0.27 -0.19 -0.11 Volatility p.a. 15.21% 14.94% 15.86% 15.77%

Alpha/Return p.a. -1.95% -0.64% -0.78% 16.47% Sharpe Ratio 1.03 1.11 1.06 1.12 80% Information Ratio -0.34 -0.11 -0.16 Volatility p.a. 13.17% 13.20% 13.78% 13.64%

Alpha/Return p.a. 3.08% 0.96% 1.78% 9.46% Sharpe Ratio 0.82 0.73 0.76 0.64 90% Information Ratio 0.55 0.19 0.40 Volatility p.a. 14.34% 13.61% 14.02% 14.10%

Table 4.1: Tabular form of the Sensitivity analysis with additional information as volatilities. Outperforming ratios are green, neutral-performing ratios yellow, and underperforming ratios red. 46 4.3. Sensitivity Analysis

4.3.3 Comparison of Training Sets To understand in more detail the performance of the diﬀerent portfolios during their prediction periods we break down the performance of each portfolio into smaller periods and study their behaviour. The periods used start at the smallest training set of 30%, since earlier predictions are not available. The following tables display the annualised alpha that each portfolio generated versus its benchmark for diﬀerent periods. Additionally, the returns and volatilities for the benchmark are shown for the same periods.

Log Return Diﬀerence

Quarter Periods in Percentiles Simulations 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% 90-100% Log Return 30% -0.70% 2.10% 5.70% -0.83% -3.39% 0.82% 2.83% Log Return 40% 8.07% 3.75% -0.92% -3.72% -1.29% -0.68% Log Return 50% 3.07% 2.20% 5.70% 0.06% -0.51% Log Return 60% 0.46% -0.44% -4.14% 4.32% Log Return 70% -0.21% -4.58% -1.55% Log Return 80% -5.03% -0.42% Log Return 90% 3.08%

Average Alpha -0.70% 5.09% 4.17% 0.23% -0.41% -2.36% 1.01% Index Return 15.03% 5.25% 15.67% -14.50% 19.50% 29.40% 9.46% Index Volatility 22.40% 22.33% 13.04% 32.00% 19.48% 12.84% 14.10%

Table 4.2: Detailed analysis of Log Return Diﬀerence portfolios’ alpha during diﬀerent time periods.

The detailed Log Return Diﬀerence table shows that the underperformance which we saw for the 70% and 80% portfolios can also be seen for the 30%, 40% and 60% portfolios during the same period. This means that, independent of the training period, there is a trend to underperform in these periods. The table also shows what was already indicated in the previous chapter: When the volatility of the benchmark falls, the return of the benchmark increases. Very strong prediction periods are the 40% - 60% percentiles. The 60%-70% percentile includes the previously mentioned Global Financial Crisis with signiﬁ- cantly negative returns and the maximal volatility, after which the random forest models seem to loose predictability. Chapter 4. Results 47

Information Ratio

Quarter Periods in Percentiles Simulations 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% 90-100% Information 30% -2.03% 3.95% -1.47% 4.30 % -1.32 % -1.49 % -2.58 % Information 40% 10.04 % 0.77 % -2.99% 3.76% 5.51 % 0.06% Information 50% -0.60% 3.35% 7.06 % 0.20% -2.37% Information 60% 6.13% 7.28 % 3.71% -1.75 % Information 70% -0.69% 2.88% -1.89% Information 80% -1.18% -0.61% Information 90% 1.78%

Average Alpha -2.03% 7.00% -0.43% 2.70% 3.22% 1.61% -1.05% Index Return 15.03% 5.25% 15.67% -14.50% 19.50% 29.40% 9.46% Index Volatility 22.40% 22.33% 13.04% 32.00% 19.48% 12.84% 14.10%

Table 4.3: Detailed analysis of Information ratio portfolios’ alpha during diﬀerent time periods.

The Information ratio portfolios follow a diﬀerent pattern. On average, the portfolios do not perform worse during the 70%-80% percentile. The 40%-60% portfolios actually outperform the benchmark notably, which implies that at least the Information ratio models’ performance during that period clearly depends on the training set. Especially the 50% and 60% models trained until just before that prediction period perform best.

Sharpe Ratio

Quarter Periods in Percentiles Simulations 30-40% 40-50% 50-60% 60-70% 70-80% 80-90% 90-100% Sharpe 30% 0.60 % 6.19% 2.75 % -1.48 % -0.27 % -3.34 % -1.84 % Sharpe 40% 10.71 % -1.50% -1.75 % 0.54 % 6.79 % 0.24% Sharpe 50% 0.17% 8.02 % 4.06 % -0.32% -2.19% Sharpe 60% 5.41% 4.10 % 0.59 % 0.37 % Sharpe 70% 1.38 % -4.78% -1.15% Sharpe 80% -3.93% 1.00% Sharpe 90% 0.96 %

Average Alpha 0.60% 8.45% 0.47% 2.55% 1.96% -0.83% -0.37% Index Return 15.03% 5.25% 15.67% -14.50% 19.50% 29.40% 9.46% Index Volatility 22.40% 22.33% 13.04% 32.00% 19.48% 12.84% 14.10%

Table 4.4: Detailed analysis of Sharpe ratio portfolios’ alpha during diﬀerent time periods.

The Sharpe ratio model performs well for the first 80% of the prediction periods, followed by a negative alpha for the final percentiles. In general, the portfolio seems to be the most stable and has the highest average alpha of all portfolios and periods we have seen so far. The three low volatility intervals, namely the 50% - 60%, 80%-90% and 90%-100% percentiles are at the same time the lowest alpha periods, which implies that the functionality of the prediction metric depends on the underlying volatility. More interesting than the behaviour of the portfolios during the 70% - 90% periods is in fact the behaviour of all three models during the 40% - 50% and the 60% - 70% periods. These sections are defined by low returns of the benchmark and strong outperformance of the random forest models. The 40% - 50% period includes the Dot Com Bubble, while the 60% - 70% period is around the Great Financial Crisis. 48 4.3. Sensitivity Analysis

It appears that the random forest model works especially well in stock market drawdowns. To verify that we will brieﬂy study the portfolio performance during those exact crises. The drawdown in the Dot Com Bubble for the S&P 500 lasted from the peak on March 24, 2000 to the low on October 9, 2002, and the Great Financial Crisis from October 11, 2007 to March 9, 2009 respectively (Yardeni, 2020). The only relevant portfolios are therefore the 30% - 60% models, since the other ones did not experience any downturns. The table displays the generated alpha versus the benchmark for each of the portfolios during the Dot Com Bubble and the Great Financial Crisis.

Portfolio Dot Com Bubble Great Financial Crisis

Log Return 30% -3.62% -4.68%

Log Return 40% 2.61% 0.40%

Log Return 50% 0.96%

Log Return 60% 0.30%

Information Ratio 30% -5.68% 1.05%

Information Ratio 40% 9.02% -5.35%

Information Ratio 50% 0.18%

Information Ratio 60% 10.87%

Sharpe Ratio 30% 1.97% -1.77%

Sharpe Ratio 40% 6.86% 9.23%

Sharpe Ratio 50% 10.31%

Sharpe Ratio 60% 11.89%

Table 4.5: Annualised alphas for the diﬀerently trained portfolios during the recessions following the Dot Com Bubble and the Great Financial Crisis. Chapter 4. Results 49

Although there are exceptions in which the portfolios underperformed the benchmarks during the recessions, the majority tends to deliver a better performance. Especially, the portfolios trained until shortly before the crises occurred, which are the 40% portfolios for the Dot Com Bubble and the 60% portfolios for the Great Financial Crisis outperform significantly. The Log Return Difference portfolios, as already seen before, are performing worse than the risk-adjusted metrics. The Sharpe ratio portfolio that attempts to predict the absolute risk-adjusted return for each stock, independent from the benchmark, is particularly successful. Since our stock set of the S&P 500 index, performs significantly better than the actual S&P 500 index, the outperformances would even increase compared to the real index. 50 4.3. Sensitivity Analysis

4.3.4 Dynamic Training Experiment Similar to the analysis for the ﬁxed training set, a sensitivity analysis for the dynamic training method can be conducted varying the number of historical quarters the random forest model is trained on. Such an analysis has been carried out with the number of quarters ranging from 10 to 150 quarters which can be found in Appendix C. This chapter only displays the simulations which have consistently provided good and bad performances for the diﬀerent prediction metrics. For all metrics, the random forest models that were trained on 20 quarters have underperformed versus the benchmark, as the following plots show:

(a) Portfolio = Log Return Diﬀerence

(b) Portfolio = Information Ratio

Figure 4.10: Portfolios for all three prediction metrics, each trained on 20 quarters in comparison to the index.

The opposite can be said about the portfolios trained on the last 75 quarters for each prediction: Chapter 4. Results 51

(a) Portfolio = Log Return Diﬀerence

(b) Portfolio = Information Ratio

Figure 4.11: Portfolios for all three prediction metrics, each trained on 75 quarters in comparison to the index.

As shown in the first experiment of Chapter 4.1.2, the Information ratio performs worst of the three, and even in the successful 75-quarter simulation outperforms only in the last 500 days, whereas the other two simulations outperform more consistently in that scenario. Again, one can wonder whether the difference in performance of the 20-quarter versus the 75- quarter is based on the different training sets or the fact that the the latter model predicts a shorter period, because it needs a longer training period in the beginning and this shorter period is easier to predict. 52 4.3. Sensitivity Analysis

In order to unravel the cause for this divergence, we compare the two diﬀerent simulation on the same course of time:

(a) Portfolio = Log Return Diﬀerence

(b) Portfolio = Information Ratio

Figure 4.12: Portfolios for all three prediction metrics, each trained on 20 and 75 quarters for an identical interval with normalized performances. These plots highlight that the testing set has a signiﬁcant inﬂuence on portfolio performance.

We ﬁnd that the 75-quarter model is still better performing than the 20-quarter model, although the annualized alpha for the 20-quarter model has improved independently from the prediction metric. This implies that the size of the training does matter, but the models in general also predict better results for the shorter time horizon. Chapter 4. Results 53

4.3.5 Industry Performance Besides looking at the overall performance of the portfolio we can also study the performance on an industry level. Since the algorithm always holds one stock per sector we can see how those stocks have performed over time compared to their sector benchmarks, and potentially ﬁnd patterns which could help understand what drives prediction accuracy. For this purpose we can reuse the results from the sensitivity analysis, which also include the sector portfolios and benchmarks. The following two plots display the performances for the sector "Banks", which is an example of a well predicted industry, and the sector "Beverages", for which the portfolios have underperformed mostly. The plots for the remaining industries can be found in Appendix C.

(a) "Banks" Portfolio Performance

(b) "Beverages" Portfolio Performance

Figure 4.13: Comparison of the underlying sector (sub-)portfolios.

In general, it is diﬃcult to see a pattern for the sector performances. The "Banks" sector has more stocks and fewer fundamentals than the worse performing "Beverages" sector. However, the remaining sectors do not indicate that this is a general rule. As could be expected from the sensitivity analysis, we can observe that the dynamic predictions also lead to worse results on a sector level than the ﬁxed training set. 54 4.4. Relative Features Experiment

4.4 Relative Features Experiment

The sensitivity analysis has shown that there are portfolios that can outperform the benchmark, but also that the performance depends on the time intervals evaluated. The next experiments changes the features used for the prediction of the random forest model, while the prediction targets remain the previously used Log Return Diﬀerence, Information ratio and Sharpe ratio. The Relative features experiment feeds the random forest predictor the quarterly changes of the fundamentals instead of the absolute values:

This is done for both ratio (e.g. Dividend yield, P/E ratio) and absolute fundamentals (e.g. Assets, Debt). The reasoning behind using changes in fundamentals opposite to using the absolute fundamental value is that markets could potentially be more aﬀected by changes in a ﬁrm’s current fundamental data rather than the level at which the fundamentals are.

The ambiguity of those two approaches can be understood by the following example: Firm A historically had varying dividends and low returns, but lately firm A has seen a significant growth in its dividends over the last 10 quarters, which resulted in a dividend yield that is far above the historical average of the stock of firm A. This led to a better performance of the stock in the same period. Firm A now publishes the fundamental data of the last quarter and reports that it will reduce the dividend for the first time in 10 quarters to the level of 2 quarters ago. The dividend is therefore still on a high level but falling. The random forest model could interpret the information above in two ways: • The random forest model has been trained on the absolute dividend yields and associates above average dividends with good future returns. Therefore, it should predict a positive performance for the next quarter, possibly in line with the performance of 2 quarters ago. • The random forest model is trained on the change in dividends and recognises a negative change in dividends for the first time in the last 10 quarters. This will lead to a prediction of negative returns for the next quarter. It is not clear which model will perform better compared to the benchmark since it is not obvious how the market sets its expectations for the next quarter and values firm A. Chapter 4. Results 55

4.4.1 Fixed Training Set The portfolios for the ﬁxed training sets are done for the 50% training period which has been successful and the 70% training set which has not been successful in the sensitivity analysis of the previous chapter. Again, we split the data set in a training (50% and 70%) and a testing set. Instead of using the absolute fundamental values as training features, we feed the random forest predictor with the changes in fundamentals from quarter to quarter. The targets remain the Log Return Diﬀerence, Sharpe and Information ratio as before.

Log Return Diﬀerence

(a) period = 50%

(b) period = 70%

Figure 4.14: Portfolios for the Log Return Diﬀerence metric of the 50% and 70% ﬁxed training sets using relative input features.

Both portfolios perform better with the relative features than before. Especially the 70% has changed from an underperforming to an outperforming portfolio, whereas the 50% has not improved its alpha signiﬁcantly. In absolute terms, the 50% portfolio is still performing better than the 70% portfolio. Also Sharpe and Information ratios are better than the benchmark’s. 56 4.4. Relative Features Experiment

Information Ratio

(a) period = 50%

(b) period = 70%

Figure 4.15: Portfolios for the Information ratio metric of the 50% and 70% ﬁxed training sets using relative input features.

Similar improvements can be found for the Information ratio portfolio. The 50% portfolio has ben- eﬁted more from the change though and is clearly outperforming with an alpha of 5.6% p.a., which is an increase of more than 4% p.a. from the absolute fundamental portfolio. The Information ratio for that portfolio is also the highest seen yet. The 70% portfolio increases its alpha about 2% p.a. to 1.67% p.a. to also ﬁnally outperform. Chapter 4. Results 57

Sharpe Ratio The Sharpe ratio model complements the picture generating two outperforming portfolios which do not deviate much in performance. The 50% portfolio is still the stronger portfolio.

(a) period = 50%

(b) period = 70%

Figure 4.16: Portfolios for the Sharpe ratio metric of the 50% and 70% ﬁxed training sets using relative input features.

The experiment for the ﬁxed training set has shown that the relative fundamentals approach provides better results than the absolute fundamentals approach. All of the six simulations could improve through that change and are ultimately outperforming. Nevertheless, the core structures of the portfolios’ performance has not changed as much: The 50% portfolios are still better performing than the 70% portfolios. 58 4.4. Relative Features Experiment

4.4.2 Dynamic Training Set This eﬀect can be further studied for the dynamic training sets. Again, we use only the best and the worst intervals from the sensitivity analysis for the three prediction metrics, namely the 20 Quarter and 75 Quarter training periods.

Log Return Diﬀerence The Log Return Diﬀerence model used to perform better for the 75 quarter than for the 20 quarter version. This has changed and the 20 quarter portfolio now generates a higher alpha. The Information ratio is still higher for the latter portfolio.

(a) period = 20 quarters

(b) period = 75 quarters

Figure 4.17: Portfolios for the Log Return Diﬀerence metric of the 20 and 75 quarter dynamic training sets using relative input features. Chapter 4. Results 59

Information Ratio We see similar results for the Information ratio portfolios, although here the order of portfolios did not change and the 75 quarter portfolio is still the better one with an outperformance of 5.36% p.a. and an Information ratio of 0.64.

(a) period = 20 quarters

(b) period = 75 quarters

Figure 4.18: Portfolios for the Information ratio metric of the 20 and 75 quarter dynamic training sets using relative input features. 60 4.4. Relative Features Experiment

Sharpe Ratio The 75-quarter Sharpe ratio portfolio has generated the highest alpha we have yet seen with 5.43% p.a. and a Sharpe ratio of 0.76. The Information ratio is slightly behind that of 75 quarter Information ratio portfolio but is still high at 0.58.

(a) period = 20 quarters

(b) period = 75 quarters

Figure 4.19: Portfolios for the Sharpe ratio metric of the 20 and 75 quarter dynamic training sets using relative input features.

The dynamically trained portfolios confirm what we have seen for the fixed training portfolios using the relative features approach. The results have improved substantially, and with no exception all portfolios have outperformed their benchmarks. To strengthen those results, we also implement the simple random forest classifier from Chapter 4.1 for the relative features experiment. The experiment is conducted similar to the one in Chapter 4.1 but with relative fundamental values as described in the introduction of Chapter 4.4. The results support our findings from above, since the correct guesses for the classifier are 51.7% for the absolute performance and 55.0% for the out-/underperformance model respectively. Both predictors can be accepted for a significance level of ↵ =0.1%. Chapter 4. Results 61

4.5 Sector Features Experiment

Until now we used all available fundamental data of each stock as features for the random forest model. This includes, as defined in Chapter 2.2, balance sheet, cashflow statement, income statement, ratio metrics and profit ratios for each stock. To identify generally applicable rules and relations between the fundamental data and the performance of the stocks, it would be interesting to analyze how the portfolios perform if trained on a subset of features that are available for all the stocks in a sector. Therefore, we will compare four different models. The first one is the original portfolio with a training period of 50% from the chapter above, indicated as plot (a). The second model, depicted in plot (b), is only using the ratio fundamental data as features, namely the ratio metrics and profit ratios. All other fundamental data is avoided because it is not normalized and varies broadly from stock to stock. Model (c) is using the ratio fundamentals from model (b), which are intersecting for a given sector, meaning only the fundamentals that each stock reports in that sector are used and stock-specific fundamental data is avoided. The random forest predictor is initiated for each stock individually and trained on each stock’s features only. Model (d) is using the ratio fundamentals from model (b), which are intersecting for a given sector, meaning only the fundamentals are used that each stock in that sector reports and stock-specific fundamental data is avoided. The random forest predictor is initiated for each sector and trained on each stock’s features in that sector. Each stock in that sector is then predicted by the sector’s random forest. This is based on the assumption that all stocks in a sector are driven similarly by the ratio fundamentals and the random forest predictor is able to identify general relationships that apply for all of the stocks.

Log Return Diﬀerence

(a) Relative Features (b) Relative Ratio Features

Figure 4.20: Portfolio experiments using different sets of fundamentals and random forests for the Log Return Difference metric and a fixed training set size of 50% 62 4.5. Sector Features Experiment

Portfolio (a) is the 50% fixed training portfolio from Chapter 4.3 that generated an alpha of 2.55% annually. Model (b) uses the same approach only reduced to ratio fundamentals. This reduction of features leads to a reduced outperformance of that portfolio, which implies that the random forest model suffers from a lack of information and hence a general reduction of features is not useful. One should rather be more selective when reducing the number of features. Model (c) that has been trained only on the features that all stocks per sector have in common performs better than both the original portfolio and the ratio portfolio. Apparently, filtering the ratio fundamentals for the relevance in a given sector has improved the predictions because less important features have been avoided. Model (d) uses the same features as model (c) but trains only on random forest predictor per sector, which it also then uses for the predictions. Therefore, it was important to only use ratio fundamentals, because balance sheet, cashflow and income statements vary significantly depending on the size of a firm and one predictor for a sector is not sufficient. The sector predictor delivers worse results than the stock predictor. Nevertheless, it does perform better than the ratio model without the sector-filtered features.

Information Ratio

(a) Relative Features (b) Relative Ratio Features

Figure 4.21: Portfolio experiments using diﬀerent sets of fundamentals and random forests for the Information ratio metric and a ﬁxed training set size of 50% Chapter 4. Results 63

Sharpe Ratio

(a) Relative Features (b) Relative Ratio Features

Figure 4.22: Portfolio experiments using diﬀerent sets of fundamentals and random forests for the Sharpe ratio metric and a ﬁxed training set size of 50%

We see comparable behaviour for the Information and Sharpe ratio metrics with the exception for the sector-model which underperforms in both cases. Although reducing the features to the subset of ratio fundamentals led in most cases to a loss in performance, we see a beneﬁt of ﬁltering those ratio fundamentals by sector intersections while initiating random forest predictors for each stock, because independent from the prediction metric all (c) models have an increase in performance versus model (b). 64 4.5. Sector Features Experiment

Therefore, it might be worth considering to use the full set of fundamentals but ﬁltering it for sector features as above, which provided a beneﬁt. This leads to the following portfolios:

(a) Log Return Diﬀerence, All Sector Features

(b) Information Ratio, All Sector Features

Figure 4.23: Portfolios for all three metrics using all fundamentals as features but ﬁltered for sector intersections and a ﬁxed training set of 50%. Chapter 4. Results 65

We see that the portfolios using the full set of fundamentals have rigorously improved compared to the ratios-only portfolios, with both being filtered for sector intersections. Besides the Information ratio portfolio, even the Log Return Difference and Sharpe ratio portfolios have improved relatively to the sector-wise unfiltered models and are both the strongest portfolios we have seen for their respective prediction metrics. It is worth noting that the Log Return Difference model, which has the goal to maximize the generated alpha, has in fact the highest alpha of all three portfolio. Similarly, the Information ratio and Sharpe ratio portfolios are leading in their respective performance indicators. This highlights that the random forest model is not just able to pick generally well performing stocks, but also that the predictions are precise enough to actually differentiate between the different prediction targets so that the portfolios behave as one would expect. Chapter 5

Conclusion

This project implemented a portfolio management algorithm with the goal of beating the S&P 500 index. Thus the final step of this thesis is to analyze and interpret the results with respect to this goal: Initially, we needed to find prediction metrics (Log Return Difference, Information and Sharpe ratio) that indicate stock (out-) performance. It could easily be verified that all three metrics, including the Sharpe ratio which does not directly imply outperformance, lead to superior returns if predicted correctly. The challenge was to find data that could be used to predict those metrics. Therefore, necessary fundamental data of all stocks in the S&P 500 index was obtained. An exploratory analysis of this fundamental data was conducted to find relationships between stock’s fundamentals and returns. As part of the exploratory analysis, we introduced the concept of copulas, which are widely used in finance as statistical tools to measure the dependence between probability distributions. We found that the correlation between the different fundamentals can be strong. Either way, because some fundamentals can be converted into one other, which is trivial information and thus the duplicated features can be avoided, or because they are non-trivially correlated as for the case of the "reinvestment-rate" with several earnings ratios.As one would expect, the dependencies between fundamentals and returns were significantly weaker and no single fundamental can substantially influence future stock returns on its own. However, we did find a set of fundamentals whose correlation with stock performance looks promising. For example, the observed behaviour between profitability ratios (e.g. P/E ratio, EV/EBITDA and dividends) and future returns is in line with the observations of Shiller (2015); Lewellen (2004); Pontiffand Schall (1998). Nevertheless, fundamentals are only a set of variables affecting stock prices. There is other information not contained in fundamental data that influences stock performance. For instance, the current coronavirus pandemic is an example of an exogenous shock to which markets have reacted quickly, although the consequences of the pandemic are not yet fully incorporated in the fundamentals. This highlights a weakness of the random forest predictor: Predictions and rebalancing can only be done once new fundamental data is reported. We have seen in Table 4.5, however, that the random forest portfolio worked well in past crises. Therefore, once the full extent of the crisis is incorporated in the fundamental data, a rational model like the random forest predictor could be a useful valuation tool, especially since crises often lead to misvaluations and an increase in market arbitrage.

66 Chapter 5. Conclusion 67

In the first setup, we trained a random forest classifier to identify stocks that would have a positive or negative performance over the next quarter in both absolute and relative terms to the benchmark. The results showed that the random forest classifier correctly forecasts the stock returns in about 50.5% of the times for both experiments. Although these results are only slightly better than a coin toss, we could show that at least the absolute performance forecast was statistically significant. Eventually, we used the insights from the classification experiment and the exploratory analysis to generate portfolios in practice. For this purpose, we applied two different training methods. For the first one, we split up the data in fixed training and testing sets. The second, dynamic training technique tried to replicate a real-life scenario. If this algorithm was actually implemented, one would only forecast stock returns for one quarter at a time based on a smaller set of historical training quarters. All variations of experiments have been conducted for the three prediction metrics defined in Chapter 3.2. The first experiments provided ambiguous results. Depending on the testing set, the portfolios out- /underperformed the benchmark with a slight tendency to outperform as we saw in the sensitivity analysis. These vague results are not surprising since the classification experiment already highlighted the difficulty of predicting the stock returns correctly. We also find that the performance of the portfolios does not depend significantly on the training technique nor the training horizon. The portfolio return is more dependent on the testing set and changes in the market regime (e.g. changes from bull to bear markets). Additionally, the experiments support the argument that random forests rarely overfit. The predictions do not change significantly for duplicated or more general, larger sets of fundamentals. Our final adaption in the input fundamentals had a substantial impact on portfolio performances. In Chapter 4.4 we introduced an experiment in which the changes in fundamentals instead of the absolute values of the fundamentals were used for the model. It appears this led to improved forecasts of the random forest predictor. Ultimately, all portfolios for all three prediction metrics, varying training techniques and time horizons, were successful in outperforming the benchmark. The best portfolios reached outperformances ↵ of more than 5% annually. The consistency and magnitude of these returns is unexpected. To verify these results, we also implemented the basic classification experiment of Chapter 4.4 using the changed fundamental inputs. Again, we trained a random forest classifier to identify stocks that would have a positive or negative performance over the next quarter in both absolute and relative terms to the benchmark. The results for this experiment have improved in line with the previous portfolios. The random forest classifier reaches a correct prediction rate of up to 55%. A following p-test proves that the results can be accepted for a significance level of 0.1% against a randomized predictor. Both experiments suggest that the random forest prediction model can in fact be used to build outperforming portfolios. This result clearly contradicts the Efficient Market Hypothesis which we referred to in the beginning of the thesis. According to the EMH, it should be impossible to outperform the market in the long-run. This statement is supported by Malkiel (1999), who provocatively claimes that : "A blindfolded monkey throwing darts at a newspaper’s financial pages could select a portfolio that would do just as well as one carefully selected by experts." Malkiel’s example can be interpreted in two ways for our case. First, the success of the random forest generated portfolios could be a coincidence, since even a randomly created portfolio should have a 50% chance of outperforming its index if we assume that the median return of an index- based portfolio is equal to the index return. Second, if a randomly build portfolio is on average generating market return, the random forest predictor only needs to be slightly better than a random predictor to outperform the market. Since the basic classification experiment suggests exactly this, we conclude that the portfolios’ outperformance is legitimate. 68

The presented approach could be further improved through several additions or changes: • Additional data could be appended to the fundamentals. For example, following the APT model, macroeconomic data like inﬂation or yield curves could be used to further improve the predictions. • The random forest predictor could be replaced by other, more sophisticated machine-learning techniques to improve the predictions independently from the training data. • The portfolio generation could follow diﬀerent rules. Instead of just equal-weighting one stock per sector, one could apply market-capitalization weights or portfolio optimization to improve the portfolio performance independently from the predictions. • The return forecasts could be combined with other trading strategies. The points above should be considered as suggestions and surely are not an extensive list of variations that could be implemented to improve the model. We conclude that the only way to have certainty about the correctness of the model developed in this thesis, is to observe a practical implementation of it over the long-term. Bibliography

Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain" goodness of fit" criteria based on stochastic processes. The annals of mathematical statistics,pages193–212. Baker, M., Bradley, B., and Wurgler, J. (2011). Benchmarks as limits to arbitrage: Understanding the low-volatility anomaly. Financial Analysts Journal,67(1):40–54. Black, F. (1972). The capital asset pricing model: Some empirical tests. Studies in the theory of capital markets,81(3):79–121. Black, F. and Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of political economy,81(3):637–654. Blitz, D., van Vliet, P., and Baltussen, G. (2019). The volatility effect revisited. The Journal of Portfolio Management,46(2):45–63. Bok, D. (2020). Copulae package. Breeden, D. T. (2005). An intertemporal asset pricing model with stochastic consumption and investment opportunities. In Theory of valuation,pages53–96.WorldScientific. Breiman, L. (1996). Bagging predictors. Machine learning,24(2):123–140. Breiman, L. (2001). Random forests. Machine learning,45(1):5–32. Breiman, L. (2020). Using random forests. Chen, N.-F. (1983). Some empirical tests of the theory of arbitrage pricing. The Journal of Finance, 38(5):1393–1414. Embrechts, P., Lindskog, F., and McNeil, A. (2001). Modelling dependence with copulas. Rapport technique, Département de mathématiques, Institut Fédéral de Technologie de Zurich, Zurich, 14. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The journal of Finance,25(2):383–417. Fama, E. F. and French, K. R. (2004). The capital asset pricing model: Theory and evidence. Journal of Economic Perspectives,18(3):25–46. Graham, B. and Dodd, D. (1934). Security Analysis: The Classic 1934 Edition.McGrawHill. Groenewold Fraser, Nicolaas, P. (1997). Share prices and macroeconomic factors. Journal of Business Finance & Accounting,24(9-10):1367–1383. Kidd, D. (2011). The sharpe ratio and the information ratio. Investment Performance Measurement Feature Articles,2011(1):1–4. Kole, E., Koedijk, K., and Verbeek, M. (2007). Selecting copulas for risk management. Journal of Banking & Finance,31(8):2405–2423.

69 70 Bibliography

Lewellen, J. (2004). Predicting returns with ﬁnancial ratios. Journal of Financial Economics, 74(2):209–235.

Li, D. X. (2000). On default correlation: A copula function approach. The Journal of Fixed Income,9(4):43–54.

Liaw, A. (2002). Classiﬁcation and regression by randomforest. Rnews,2(3):18–22.

Lintner, J. (1975). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. In Stochastic optimization models in ﬁnance,pages131–155. Elsevier.

Malkiel, B. G. (1999). A random walk down Wall Street: including a life-cycle guide to personal investing. WW Norton & Company.

Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica: Journal of the Econometric Society,pages867–887.

Meyer, C. (2013). The bivariate normal copula. Communications in Statistics-Theory and Methods, 42(13):2402–2422.

Mossin, J. (1966). Equilibrium in a capital asset market. Econometrica: Journal of the econometric society,pages768–783.

Nelsen, R. (1999). An introduction to copulas springer verlag.

Nelsen, R. (2001). Kendall tau metric, hazewinkel, michiel, encyclopedia of mathematics.

Papaioannou, G., Kohnová, S., Bacigál, T., Szolgay, J., Hlavčová, K., and Loukas, A. (2016). Joint modelling of ﬂood peaks and volumes: A copula application for the danube river. Journal of Hydrology and Hydromechanics,64(4):382–392.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research,12:2825–2830.

Pontiﬀ, J. and Schall, L. D. (1998). Book-to-market ratios as predictors of market returns. Journal of Financial Economics,49(2):141–160.

Reuters, T. (2019). Thomson reuters eikon.

Roll, R. and Ross, S. A. (1980). An empirical investigation of the arbitrage pricing theory. The Journal of Finance,35(5):1073–1103.

Roll, R. and Ross, S. A. (1984). The arbitrage pricing theory approach to strategic portfolio planning. Financial analysts journal,40(3):14–26.

Rubinstein, M. E. (1973). A mean-variance synthesis of corporate ﬁnancial theory. The Journal of Finance,28(1):167–181.

Russel, F. (2019). Industry classiﬁcation benchmark.

Salleh, N., Yusof, F., and Yusop, Z. (2016). Bivariate copulas functions for ﬂood frequency analysis. In AIP Conference Proceedings, volume 1750, page 060007. AIP Publishing LLC.

Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. The journal of ﬁnance,19(3):425–442.

Sharpe, W. F. (1966). Mutual fund performance. The Journal of business,39(1):119–138. Bibliography 71

Shiller, R. J. (2015). Irrational exuberance: Revised and expanded third edition.Princetonuniversity press. Sornette, D., Malevergne, Y., et al. (2003). Testing the gaussian copula hypothesis for ﬁnancial assets dependences. Quantitative Finance,3(4):231–250.

Yardeni, E. (2020). S&p 500 bull bearmarket tables. Appendix A

Data

A.1 Introduction

Inﬂuence of P/E ratios on long-term performance

Figure A.1: from Robert Shiller’s Irrational Exuberance

72 Appendix A. Data 73

A.2 Data

List of Stocks by Sectors

Aerospace and Defense LMT NOC TXT GD RTN BA UTX Automobiles and Parts HOG F Banks HBAN.O BBT PBCT.O C CMA WFC MTB ZION.O KEY PNC RF USB FITB.O STI SIVB.O Beverages PEP.O KO MNST.O TAP Chemicals AVY DWDP.K ECL PPG IFF APD Construction and Materials VMC SHW AOS Electricity AEP ED CMS EVRG.K PNW XEL.O NEE PPL LNT.O SO PEG ES Electronic and Electrical Equipment PKI AME EMR Financial Services (Sector) SPGI.K RJF NTRS.O BEN TROW.O BK EFX STT SCHW.K Fixed Line Telecommunications CTL VZ T Food Producers K TSN CPB GIS CAG ADM MKC HSY Food and Drug Retailers CVS WBA.O KR Gas, Water and Multiutilities WEC NI ATO DUK CNP General Industrials HON DHR GE BLL SEE MMM PH ETN ARNC.K General Retailers TJX TIF HRB LOW COST.O BBY ROL HD TGT 74 A.2. Data

ROST.O Health Care Equipment and Services XRAY.O UHS BDX HUM CI BAX SYK COO MDT ABMD.O UNH Household Goods and Home Construction LEG PG CLX NWL.O CHD PHM WHR Industrial Engineering CMI SWK FLS CAT IR DE ROK PNR PCAR.O SNA Industrial Metals and Mining NUE Industrial Transportation FDX CSX.O UNP Leisure Goods HAS.O EA.O Life Insurance LNC AFL Media OMC CMCSA.O IPG Mining NEM Nonlife Insurance PGR AJG L MMC CB CINF.O AIG AON Oil Equipment and Services OKE BHGE.K HP WMB SLB Oil and Gas Producers APA MRO COP OXY HFC XOM VLO CVX HES Personal Goods VFC CL KMB PVH NKE Pharmaceuticals and Biotechnology PFE CAH MYL.O AMGN.O LLY CELG.O AGN JNJ MRK Real Estate Investment Trusts WY DRE UDR PSA WELL.K HST FRT Software and Computer Services MSFT.O ADSK.O SYMC.O CDNS.O CERN.O ADBE.O Appendix A. Data 75

IBM ORCL.K Support Services PAYX.O JCI XRX WM RHI ADP.O CTAS.O GWW FISV.O Technology Hardware and Equipment AMD.O AAPL.O GLW LRCX.O AMAT.O MXIM.O SWKS.O INTC.O ADI.O TXN.O KLAC.O HPQ MU.O MSI WDC.O HPQ WDC.O Tobacco MO Travel and Leisure ALK CCL MGM LUV 76 A.2. Data

List of Fundamentals

Fundamentals %-electric net-change-in-cash %-fee-revenue net-debt-incl-pref-stock-min-interest %-gas net-income %-lt-debt-to-total-capital net-income-after-taxes acceptances-outstanding net-income-before-extra-items accounting-change net-income-before-taxes accounts-payable net-income-starting-line accounts-receivable net-interest-inc-after-loan-loss-prov accounts-receivable-trade-gross net-interest-income accounts-receivable-trade-net net-investment-income accrued-expenses net-loans accrued-investment-income net-margin accumulated-depreciation-total net-premiums-earned acquisition-of-business net-revenues additional-paid-in-capital net-sales advertising-expense non-cash-items advertising-expense-supplemental non-interest-expense-bank amort-of-acquisition-costs-supplemental non-interest-income-bank amort-of-intangibles-supplemental non-interest-income-op-inc amortization normalized-ebit amortization-of-acquisition-costs normalized-ebitda amortization-of-intangibles normalized-inc-avail-to-com amortization-of-policy-acquisition-costs normalized-income-after-taxes assets-equity normalized-income-before-taxes bank-total-revenue note-receivable-long-term basic-eps-excluding-extraordinary-items notes-payable-short-term-debt basic-eps-including-extraordinary-items notes-receivable-short-term basic-normalized-eps operating-income basic-weighted-average-shares operating-margin buildings-gross operations-maintenance capital-expenditures options-exercised capital-lease-obligations oreo-%-of-total-loans cash other-assets cash-and-short-term-investments other-assets-liabilities-net cash-dividends-paid-common other-assets-total cash-dividends-paid-preferred other-bearing-liabilities-total cash-due-from-banks other-comprehensive-income cash-equivalents other-current-assets cash-from-financing-activities other-current-assets-total cash-from-investing-activities other-current-liabilities cash-from-operating-activities other-current-liabilities-total cash-interest-paid other-earning-assets-total cash-taxes-paid other-equity changes-in-working-capital other-equity-total common-stock other-expense common-stock-net other-financing-cash-flow common-stock-total other-insurance-revenue construction-in-progress-gross other-interest-income convertible-preferred-stock-non-rdmbl other-investing-cash-flow cost-of-revenue other-investing-cash-flow-items-total cost-of-revenue-total other-liabilities Appendix A. Data 77

Fundamentals current-port-of-lt-debt-capital-leases other-liabilities-total current-ratio other-long-term-assets customer-acceptances other-long-term-assets-total customer-advances other-long-term-liabilities debt-equity other-net defered-income-tax-long-term-asset other-net-1 deferred-charges other-net-2 deferred-gas-cost other-non-cash-items deferred-income-tax other-non-insurance-revenue deferred-income-tax-current-asset other-non-operating-income-expense deferred-income-tax-current-liability other-non-utility-revenue deferred-income-tax-lt-liability other-operating-cash-flow deferred-investment-tax-credit other-operating-expense deferred-policy-acquisition-costs other-operating-expenses-total deferred-revenue-current other-payables deferred-taxes other-policyholders’-funds deposits other-property-plant-equipment-gross depreciation other-property-plant-equipment-net depreciation-amortization other-real-estate-owned depreciation-depletion other-revenue depreciation-supplemental other-revenue-total diluted-eps-excluding-extraord-items other-short-term-borrowings diluted-eps-including-extraord-items other-short-term-investments diluted-net-income other-unusual-expense-income diluted-normalized-eps other-utility-revenue diluted-weighted-average-shares payable-accrued dilution-adjustment pension-benefits-overfunded discontinued-operations pension-benefits-underfunded discontinued-operations-curr-liability policy-benefits-liabilities discontinued-operations-liabilities policy-liabilities discontinued-operations-lt-asset policy-liabilities-1 discountinued-operations-current-asset policy-liabilities-2 dividends-payable preferred-dividends dividends-per-share-com-stock-issue-2 preferred-stock-net dividends-per-share-com-stock-issue-3 preferred-stock-non-redeemable dividends-per-share-com-stock-issue-4 preferred-stock-non-redeemable-net dps-common-stock-primary-issue prepaid-expenses ebitda-margin pretax-margin effect-of-special-items-on-income-taxes property-other-taxes effective-tax-rate property-plant-equipment-total-gross efficiency-ratio property-plant-equipment-total-net electric-operations provision-for-doubtful-accounts eop-loans-eop-deposits provision-for-income-taxes equity-in-affiliates purchase-acquisition-of-intangibles equity-in-affiliates-supplemental purchase-of-fixed-assets equity-in-net-earnings-loss purchase-of-investments esop-debt-guarantee purchased-power excise-taxes-payments purchased-r-d extraordinary-item purchased-r-d-written-off federal-funds-repos quick-ratio fedfundspurch-scrtysoldunderrepurchagrmt realized-unrealized-gains-losses fedfundssold-scrtypurch-underresaleagrmt receivables-other fees-commissions-from-operations redeemable-convertible-preferred-stock 78 A.2. Data

Fundamentals fhlb-borrowings redeemable-preferred-stock financing-cash-flow-items redeemable-preferred-stock-total foreclosed-real-estate reinsurance-asset foreign-exchange-effects reinsurance-payable foreign-pension-plan-expense reinsurance-receivable free-cash-flow rental-expense-supplemental fuel-expense repurch-retirement-of-common-preferred fuel-inventory repurchase-retirement-of-common fuel-purchased-for-resale repurchase-retirement-of-preferred gain-loss-on-sale-of-assets research-development gas-in-storage-inventory research-development-exp-supplemental gas-operations restricted-cash-current goodwill-gross restricted-cash-long-term goodwill-net restructuring-charge gross-dividends-common-stock retained-earnings-accumulated-deficit gross-margin revenue gross-profit sale-issuance-of-common gross-revenue sale-issuance-of-common-preferred impairment-assets-held-for-sale sale-issuance-of-preferred impairment-assets-held-for-use sale-maturity-of-investment inc-tax-ex-impact-of-sp-items sale-of-business income-available-to-com-excl-extraord sale-of-fixed-assets income-available-to-com-incl-extraord sale-of-loans income-taxes-payable sales-returns-and-allowances insurance-commissions-fees-premiums securities-%-avg-earning-assets insurance-receivables securities-for-sale insurance-reserves securities-held intangible-net security-deposits intangibles-gross selling-general-admin-expenses-total intangibles-net selling-general-administrative-expense interest-adjustment-primary-eps separate-accounts-assets interest-dividends-on-investment-secs separate-accounts-liability interest-earning-deposits service-cost-foreign interest-exp-inc-net-operating-total shares-outs-common-stock-primary-issue interest-expense-financial-oper-suppl shares-outstanding-common-issue-2 interest-expense-income-net-operating shares-outstanding-preferred-issue-1 interest-expense-net-non-operating short-term-debt-issued interest-expense-net-operating short-term-debt-net interest-expense-non-operating short-term-debt-reduction interest-expense-operating short-term-investments interest-expense-supplemental software-development-costs interest-fees-on-loans steam-operations interest-inc-exp-net-non-op-total tangible-book-value-common-equity interest-income-bank tax-on-extraordinary-items interest-income-exp-net-non-operating taxes-payable interest-income-non-bank times-interest-earned interest-income-non-operating total-adjustments-to-net-income interest-income-operating total-assets interest-invest-income-non-operating total-cash-dividends-paid interest-investment-income-operating total-common-shares-outstanding interest-on-deposit total-current-assets interest-on-deposits total-current-assets-less-inventory interest-on-other-borrowings total-current-liabilities Appendix A. Data 79

Fundamentals interest-receivable total-debt inventories total-debt-issued inventories-finished-goods total-debt-reduction inventories-other total-deposits inventories-raw-materials total-equity inventories-work-in-progress total-equity-minority-interest investment-income-non-operating total-extraordinary-items investment-income-operating total-gross-loans investment-net total-interest-expense investment-securities-gains total-inventory investment-securities-gains-losses total-investment-securities investment-securities-losses total-liabilities issuance-retirement-of-debt-net total-liabilities-shareholders’-equity issuance-retirement-of-stock-net total-long-term-debt labor-related-expense total-long-term-debt-supplemental labor-related-expense-suppl total-operating-expense land-improvements-gross total-pension-expense litigation total-preferred-shares-outstanding loan-loss-allowances total-premiums-earned loan-loss-provision total-receivables-net loans total-revenue loans-gains-losses total-short-term-borrowings loans-held-for-sale total-special-items loans-origination-operating total-utility-plant-net long-term-debt trading-account-assets long-term-debt-issued trading-account-interest long-term-debt-matur-in-year-6-beyond translation-adjustment long-term-debt-maturing-in-2-3-years treas-shares-common-stock-prmry-issue long-term-debt-maturing-in-year-2 treasury-stock long-term-debt-maturing-within-1-year treasury-stock-common long-term-debt-net unbilled-utility-revenues long-term-debt-reduction underwriting-commissions long-term-investments unearned-income loss-adjustment unearned-premium-unearned-revenue loss-gain-on-sale-of-assets-operating unearned-premiums losses-benefits-and-adjustments unrealized-gain-loss losses-benefits-and-adjustments-total unusual-expense-income lt-investment-affiliate-companies unusual-items lt-investments-other utility-plant-accumulated-depreciation machinery-equipment-gross utility-plant-gross minimum-pension-liability-adjustment utility-plant-net minority-interest utility-revenue-as-%-total-revenue minority-interest-non-redeemable warrants-converted minority-interest-redeemable water-operations minority-interest-supplemental x-earnings-retention miscellaneous-earnings-adjustment x-leverage-assets-equity natural-resources-gross x-pretax-margin net-cash-beginning-balance x-tax-complement 80 Appendix B. Distributions 81

Appendix B

Distributions

Fundamental Data Distributions by Sector 82 Appendix B. Distributions 83 84 Appendix B. Distributions 85 86 Appendix B. Distributions 87 88 Appendix B. Distributions 89 90 Appendix B. Distributions 91 92 Appendix B. Distributions 93 94 Appendix B. Distributions 95 96 Appendix B. Distributions 97 98 Appendix B. Distributions 99 100 Appendix B. Distributions 101 102 Appendix B. Distributions 103 104 Appendix B. Distributions 105 106 Appendix B. Distributions 107 108 Appendix B. Distributions 109 110 Appendix B. Distributions 111 112 Appendix B. Distributions 113 114 B.1. Joint distributions

B.1 Joint distributions

(a) Aerospace and Defense

(b) Food Producers

(d) Software and Computer Services

Figure B.1: T-test statistics for the joint distributions of the Enterprise Value/EBITDA multiple for diﬀerent sectors Appendix B. Distributions 115

(a) Financial Services

(b) Food Producers

(d) Personal Goods

Figure B.2: T-test statistics for the joint distributions of the P/E ratio for diﬀerent sectors 116B.2. Canonical averaging of the volatility and return time series conditional on ﬁnancial ratios

B.2 Canonical averaging of the volatility and return time series conditional on ﬁnancial ratios Appendix B. Distributions 117 118B.2. Canonical averaging of the volatility and return time series conditional on ﬁnancial ratios Appendix B. Distributions 119 120 B.3. Copulas

B.3 Copulas Appendix B. Distributions 121 122 B.3. Copulas Appendix B. Distributions 123 124 B.3. Copulas Appendix B. Distributions 125 126 B.3. Copulas Appendix B. Distributions 127 Appendix C

Experiments

C.1 Additional Plots for 4.2.

Industry Performance Plots

128 Appendix C. Experiments 129 130 C.1. Additional Plots for 4.2.

Figure C.1: Industry Performance Appendix C. Experiments 131

C.2 Additional Plots for 4.2.4

Log Return Diﬀerence

(a) period = 10% (b) period = 30%

(e) period = 150%

Figure C.2: Dynamic Sensitivity analysis for LR portfolio 132 C.2. Additional Plots for 4.2.4

Information Ratio

(a) period = 10% (b) period = 30%

(e) period = 100% (f) period = 150%

Figure C.3: Dynamic Sensitivity analysis for IR portfolio Appendix C. Experiments 133

Sharpe Ratio

(a) period = 10% (b) period = 30%

(e) period = 60% (f) period = 80%

Figure C.4: Dynamic Sensitivity analysis for SR portfolio C.3 Classiﬁers for Relative Features

Proﬁt and Loss

Total number of quarter: 23325 Number of quarters with positive return: 11606 Number of quarters with negative return: 11719 Total Correct guesses: 12060 Correct guesses: 51.70% P-Value, ↵ =5%: 50.53% P-Value, ↵ =1%: 50.69% P-Value, ↵ =0.1%: 51.01%

Outperforming and Underperforming

Total number of quarter: 23325 Number of quarters with positive return: 14008 Number of quarters with negative return: 9317 Total Correct guesses: 12836 Correct guesses: 55.03% P-Value, ↵ =5%: 50.53% P-Value, ↵ =1%: 50.74% P-Value, ↵ =0.1%: 51.09% Department of Management, Technology, and Economics Chair of Entrepreneurial Risks Prof. Dr. Didier Sornette

Title of work: Beating the Market: A Quantitative Approach to Fundamental Investing

Thesis type and date: Master Thesis, May 2020

Supervision: Sumit Kumar Ram Prof. Dr. Didier Sornette

Student: Name: Niklas Lappe E-mail: [email protected] Legi-Nr.: 14-941-538 Semester: SS 2020

Statement regarding plagiarism: By signing this statement, I aﬃrm that I have read and signed the Declaration of Originality, independently produced this paper, and adhered to the general practice of source citation in this subject-area.

Declaration of Originality: http://www.ethz.ch/faculty/exams/plagiarism/confirmation_en.pdf

Zurich, 18. 5. 2020: