The Illusion of Predictability: How Regression Statistics Mislead Experts Emre Soyer A, Robin M

International Journal of Forecasting 28 (2012) 695–711 Contents lists available at SciVerse ScienceDirect International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast The illusion of predictability: How regression statistics mislead experts Emre Soyer a, Robin M. Hogarth a,b,∗ a Universitat Pompeu Fabra, Department of Economics & Business, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain b ICREA, Barcelona, Spain article info a b s t r a c t Keywords: Does the manner in which results are presented in empirical studies affect perceptions Regression of the predictability of the outcomes? Noting the predominant role of linear regression Presentation formats analysis in empirical economics, we asked 257 academic economists to make probabilistic Probabilistic inference Prediction inferences based on different presentations of the outputs of this statistical tool. The Graphics questions concerned the distribution of the dependent variable, conditional on known Uncertainty values of the independent variable. The answers based on the presentation mode that is standard in the literature demonstrated an illusion of predictability; the outcomes were perceived to be more predictable than could be justified by the model. In particular, many respondents failed to take the error term into account. Adding graphs did not improve the inference. Paradoxically, the respondents were more accurate when only graphs were provided (i.e., no regression statistics). The implications of our study suggest, inter alia, the need to reconsider the way in which empirical results are presented, and the possible provision of easy-to-use simulation tools that would enable readers of empirical papers to make accurate inferences. ' 2012 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. 1. Introduction between the independent and dependent variables. It is also essential to acknowledge the level of uncertainty in- Much academic research in empirical economics inherent in outcomes of the dependent variable, conditional volves determining whether or not one or several variables on values of the independent variable. For example, con- have causal effects on another variable. The statistical tool sider a decision maker who is pondering which actions to used for making such affirmations is typically regression take and how much to do so in order to reach a certain goal. analysis, where the terms `ìndependent'' and ``dependent'' This requires conjectures to be formed about the individ- are used to distinguish cause(s) from outcomes. The results ual outcomes that would result from specific inputs. More- from most analyses consist of statements as to whether over, the answers to these questions depend not only on or not particular independent variables are ``significant'' estimating average effects, but also on the distribution of in affecting outcomes (the dependent variable), and most possible effects around the average. discussions of the importance of such variables focus on In this paper, we argue that the emphasis placed on the `àverage'' effects on outcomes of possible changes in determining average causal effects in the economics liter- inputs. However, if the analysis is used for prediction, em- ature limits our ability to make correct probabilistic fore- phasizing only statistically significant average effects re- casts. In particular, the way in which results are presented sults in an incomplete characterization of the relationship in regression analyses obfuscates the uncertainty inherent in the dependent variable. As a consequence, consumers of ∗ the economic literature can be subject to what we call the Corresponding author at: Universitat Pompeu Fabra, Department of `ìllusion of predictability''. Economics & Business, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain. Tel.: +34 93 5422561; fax: +34 93 5421746. Whereas it can be argued that the way in which in- E-mail addresses: [email protected] (E. Soyer), formation is presented should not affect rational inter- [email protected] (R.M. Hogarth). pretation and analysis, there is abundant psychological 0169-2070/$ – see front matter ' 2012 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved. doi:10.1016/j.ijforecast.2012.02.002 696 E. Soyer, R.M. Hogarth / International Journal of Forecasting 28 (2012) 695–711 evidence demonstrating that such presentation effects do to improve inferences. However, presenting the results occur. Many studies have shown, for example, the way in in graphical fashion alone improved the accuracy. The which subtle changes in questions designed to elicit prefer- implications of our findings, including suggested ways of ences are subject to contextual influences (see, e.g., Kahne- improving statistical reporting, are discussed in Section5. man& Tversky, 1979). Moreover, these have been reported in both controlled laboratory conditions and field stud- 2. Current practice ies involving appropriately motivated experts (Camerer, 2000; Thaler& Sunstein, 2008). The human information There are many sources of empirical analyses in processing capacity is limited, and the manner in which economics. In order to obtain a representative sample of attention is allocated has important implications for both current practice, we selected all of the articles published revealed preferences and inferences (Simon, 1978). in the 3rd issues (of each year) of four leading journals Recently, Gigerenzer and his colleagues (Gigerenzer, between 1998 and 2007 (441 articles). The journals were Gaissmaier, Kurz-Milcke, Schwartz,& Woloshin, 2007) American Economic Review (AER), Quarterly Journal of reviewed research on how probabilities and statistical in- Economics (QJE), Review of Economic Studies (RES) and formation are presented, and consequently perceived, by Journal of Political Economy (JPE). Among these articles, individuals or specific groups that use them frequently in we excluded those with time series analyses, and only their decisions. They show that mistakes in probabilistic included those with cross-sectional analyses where the reasoning and the miscommunication of statistical infor- authors identify one or more independent variables as mation are common. Their work focuses mainly on the statistically significant causes of relevant economic and fields of medicine and law, where doctors, lawyers and social outcomes. Our aim is to determine how the judges fail to communicate crucial statistical information consumers of this literature translate the findings about appropriately in particular situations, thereby leading to average causal effects into perceptions of predictability. biased judgments that have a negative impact on others. Many of the articles published in these journals are em- One such example is the failure of gynecologists to infer pirical. Over 70% of the empirical analyses use variations the probability of cancer correctly, given the way in which of regression analysis, of which 75% have linear specifi- mammography results are communicated. cations. Regression analysis is clearly the most prominent We examine the way in which economists communi- tool used by economists to test hypotheses and identify re- cate statistical information. Specifically, we note that much lationships among economic and social variables. of the work in empirical economics involves the estima- In economics journals, empirical studies follow a tion of average causal effects through the technique of common procedure for displaying and evaluating results. regression analysis. However, when we asked a large sam- Typically, authors provide a table that displays the ple of economists to use the standard reported outputs of descriptive statistics of the sample used in the analysis. the simplest form of regression analysis to make proba- Either before or after this display, they describe the bilistic forecasts for decision making purposes, nearly 70% specification of the model on which the analysis is based, of them experienced difficulties. The reason for this, we then provide the regression results in detailed tables. In believe, is that current reporting practices focus attention most cases, these results include the coefficient estimates on the uncertainty surrounding the model parameter esti- and their standard errors, along with other frequently mates, and fail to highlight the uncertainty concerning out- reported statistics, such as the number of observations and comes of the dependent variable conditional on the model the R2 values. identified. On the other hand, when attention was directed Table 1 summarizes these details for the sample of appropriately – by graphical as opposed to tabular means studies referred to above. It shows that, apart from – over 90% of our respondents made accurate inferences. the regression coefficients and their standard errors (or In the next section, we provide some background on t-statistics), there is not much agreement as to what the practice and evolution of reporting empirical results in else should be reported. The data therefore suggest that economics journals. In Section3 we provide information economists probably understand the inferences that can concerning the survey we conducted with economists, be made about regression coefficients or the average which involved them answering four decision-oriented impact of manipulating an independent variable quite questions based on a standard format for reporting the well; however, their ability to make inferences

The Illusion of Predictability: How Regression Statistics Mislead Experts Emre Soyer A, Robin M

Wavelet Entropy Energy Measure (WEEM): a Multiscale Measure to Grade a Geophysical System's Predictability

Biostatistics for Oral Healthcare

Big Data for Reliability Engineering: Threat and Opportunity

Volatility Estimation of Stock Prices Using Garch Method

A Python Library for Neuroimaging Based Machine Learning

BIOSTATS Documentation

Public Health & Intelligence

Measuring Predictability: Theory and Macroeconomic Applications," Journal of Applied Econometrics, 16, 657-669

Central Limit Theorems When Data Are Dependent: Addressing the Pedagogical Gaps

Measuring Complexity and Predictability of Time Series with Flexible Multiscale Entropy for Sensor Networks

Incubation Periods Impact the Spatial Predictability of Cholera and Ebola Outbreaks in Sierra Leone

Understanding Predictability