Fundamental Analysis and the Cross-Section of Stock Returns: a Data-Mining Approach
Total Page:16
File Type:pdf, Size:1020Kb
Fundamental Analysis and the Cross-Section of Stock Returns: A Data-Mining Approach Abstract A key challenge to evaluate data-mining bias in stock return anomalies is that we do not observe all the variables considered by researchers. We overcome this challenge by constructing a “universe” of fundamental signals from financial statements and by using a bootstrap approach to measure the impact of data mining. We find that many fundamental signals are significant predictors of cross-sectional stock returns even after accounting for data mining. This predictive ability is more pronounced following high-sentiment periods, during earnings-announcement days, and among stocks with greater limits-to-arbitrage. Our evidence suggests that fundamental- based anomalies are not a product of data mining, and they are best explained by mispricing. Our approach is general and can be applied to other categories of anomaly variables. October 2015 “Economists place a premium on the discovery of puzzles, which in the context at hand amounts to finding apparent rejections of a widely accepted theory of stock market behavior.” Merton (1987, p. 104) 1. Introduction Finance researchers have devoted a considerable amount of time and effort to searching for stock return patterns that cannot be explained by traditional asset pricing models. As a result of these efforts, there is now a large body of literature reporting hundreds of cross-sectional return anomalies (Green, Hand, and Zhang (2013), Harvey, Liu, and Zhu (HLZ 2014), and McLean and Pontiff (2014)). An important debate in the literature is whether the abnormal returns documented in these studies are compensation for systematic risk, evidence of market inefficiency, or simply the result of extensive data mining. Data-mining concern arises because “the more scrutiny a collection of data is subjected to, the more likely will interesting (spurious) patterns emerge” (Lo and MacKinlay (1990, p.432)). Intuitively, if enough variables are considered, then by pure chance some of these variables will generate abnormal returns even if they do not genuinely have any predictive ability for future stock returns. Lo and MacKinlay contend that the degree of data mining bias increases with the number of studies published on the topic. The cross section of stock returns is arguably the most researched and published topic in finance; hence, the potential for spurious findings is also the greatest. Although researchers have long recognized the potential danger of data mining, few studies have examined its impact on a broad set of cross-sectional stock return anomalies.1 The lack of research in this area is in part because of the difficulty to account for all the anomaly variables that have been considered by researchers. Although one can easily identify published variables, one 1 The exceptions are HLZ (2014) and McLean and Pontiff (2014). We note that many papers have examined the impact of data mining on individual anomalies (e.g., Jegadeesh and Titman (2001)). 1 cannot observe the numerous variables that have been tried but not published or reported due to the “publication bias.”2 In this paper, we overcome this challenge by examining a large and important class of anomaly variables, i.e., fundamental-based variables, for which a “universe” can be reasonably constructed. We focus on fundamental-based variables, i.e., variables derived from financial statements, for several reasons. First, many prominent anomalies such as the asset growth anomaly (Cooper, Gulen, and Schill (2008)) and the gross profitability anomaly (Novy-Marx (2013)) are based on financial statement variables. HLZ (2014) report that accounting variables represent the largest group among all the published cross-sectional return predictors. Second, researchers have considerable discretion to the selection and construction of fundamental signals. As such, there is ample opportunity for data snooping. Third and most importantly, although there are hundreds of financial statement variables and numerous ways of combining them, we can construct a “universe” of fundamental signals by using permutational arguments. The ability to construct such a universe is important because in order to account for the effects of data mining, one should not only include variables that were reported, but also variables that were considered but unreported (Sullivan, Timmermann, and White (2001)). Financial statement variables are ideally suited for such an analysis. We construct a universe of fundamental signals by imitating the search process of a data snooper. We start with all accounting variables in Compustat that have a sufficient amount of data. We then use permutational arguments to construct over 18,000 fundamental signals. We choose the functional forms of these signals by following the previous academic literature and industry practice, but make no attempt to select specific signals based on what we think (or know) should 2 The publication bias refers to the fact that it is difficult to publish a non-result (HLZ (2014)). 2 work. Our construction design ensures a comprehensive sample that does not bias our search in any particular direction. We form long-short portfolios based on each fundamental signal and assess the significance of long-short hedge returns by using a bootstrap procedure. The bootstrap approach is desirable in our context for several reasons. First, long-short returns are highly non-normal. Second, long-short returns across fundamental signals exhibit complex dependencies. Third, evaluating the performance of a large number of fundamental signals involves a multiple comparison problem. We follow Fama and French (2010) and randomly sample time periods with replacement. That is, we draw the entire cross section of anomaly returns for each time period. The simulated returns have the same properties as the actual returns except that we set the true alpha for the simulated returns to zero. We follow many previous studies and conduct our bootstrap analysis on the t-statistics of alphas because t-statistics is a pivotal statistics and has better sampling properties than alphas. By comparing the cross-section of actual t-statistics with that of simulated t-statistics, we are able to assess the extent to which the observed performance of top-ranked signals is due to sampling error (i.e., data mining). Our results indicate that the top-ranked fundamental signals in our sample exhibit superior long-short performance that is not due to sampling variation. The bootstrapped p-values for the extreme percentiles of t-statistics are all less than 5%. For example, the 99th percentile of t-statistics for equal-weighted 4-factor alphas is 6.28 for the actual data. In comparison, none of the simulation runs have a 99th percentile of t-statistics that is as high as 6.28, indicating that we would not expect to find such extreme t-statistics under the null hypothesis of no predictive ability. The results for value-weighted returns are qualitatively similar. The 99th percentile of t-statistics for the actual 3 data is 3.29, with a bootstrapped p-value of 0.015, which indicates that only 1.5% of the simulation runs produce a 99th percentile of t-statistics higher than 3.29. Overall, our bootstrap results strongly suggest that the superior performance of the top fundamental signals cannot be attributed to pure chance. We divide our sample period into two halves and find that our main results hold in both sub-periods. More importantly, we find strong evidence of performance persistence. Signals ranked in the extreme quintiles during the first half of the sample period are more likely to stay in the same quintile during the second half of the sample period than switching to the opposite quintile. In addition, sorting based on alpha t-statistics during the first sub-period yields a significant spread in long-short returns during the second sub-period. These results provide further evidence that the predictive ability of fundamental signals is unlikely to be driven by data mining. Our results are robust. We find qualitatively similar results when we apply our bootstrap procedure to alphas instead of t-statistics. That is, the extreme percentiles of actual alphas are significantly higher than their counterparts in the simulated data. Our results are robust to alternative universe of fundamental signals. In particular, we obtain similar results when we impose more (or less) stringent data requirements on accounting variables. Our results are also unchanged when we use industry-adjusted financial ratios to construct fundamental signals. Finally, our main findings hold for small as well as large stocks. Having shown that fundamental-based anomalies are not a result of data mining, we next investigate whether they are consistent with mispricing-based explanations. We perform three tests. First, behavioral arguments suggest that if the abnormal returns to fundamental-based trading strategies arise from mispricing, then they should be more pronounced among stocks with greater limits to arbitrage. Consistent with this prediction, we find that the t-statistics for top-performing 4 fundamental signals are significantly higher among small, low-institutional ownership, high- idiosyncratic volatility, and low-analyst coverage stocks. Second, to the extent that fundamental- based anomalies are driven by mispricing (and primarily by overpricing), anomaly returns should be significantly higher following high-sentiment periods (Stambaugh, Yuan, and Yu (2012)). We find strong evidence consistent with this prediction. Third, behavioral theories suggest that predictable stock returns arise from corrections of mispricing and that price corrections are more likely to occur around earnings announcement periods when investors update their prior beliefs (La Porta et al. (1997) and Bernard, Thomas, and Wahlen (1997)). As such, we should expect the anomaly returns to be significantly higher during earnings announcement periods. Our results support this prediction. Our paper adds to the literature on fundamental analysis. Oh and Penman (1989) show that an array of financial ratios can predict future earnings changes and stock returns.