Prediction Intervals for Macroeconomic Variables Based on Uncertainty Measures

Pei-Hua (Peggy) Hsieh ANR: 591577

Academic supervisor: Prof. Dr. Bertrand Melenberg Second reader: Prof. Dr. Pavel Ciˇzekˇ Internship supervisor: Dr. Adam Elbourne

A thesis submitted in partial fulfillment of the requirements for the degree: Master of Science in and Mathematical Economics

Tilburg School of Economics and Management Tilburg University The Netherlands Date: September 26, 2016 Prediction Intervals for Macroeconomic Variables Based on Uncertainty Measures

Pei-Hua (Peggy) Hsieh ∗

October 7, 2016

Abstract

This paper develops a method for incorporating uncertainty measures into the formation of confidence intervals around predictions of macroeconomic variables. These intervals are used for plotting fan charts, a graphical rep- resentation of forecasts pioneered by the Bank of England, which is commonly employed by central banks. I demonstrate and evaluate the proposed method by using it to plot fan charts for real GDP growth rate in the Netherlands. This paper aims to provide concrete recommendations for the Centraal Planbureau (CPB).

∗I am indebted to Prof. Dr. Bertrand Melenberg for his inspiring guidance and support. I would also like to thank my supervisor from the CPB Dr. Adam Elbourne for providing me with and helpful comments. Finally, I thank my family, instructors at Tilburg university and colleagues from the CPB.

1 Contents

1 Introduction6

2 Methodology 10 2.1 Step 1...... 11 2.2 Step 2...... 12 2.3 Step 3...... 15

3 Data description 16

4 Summary 19 4.1 Dependent Variable...... 19 4.2 Explanatory Variables...... 23

5 Empirical results 24 5.1 Short dataset: using data from Consensus Economics...... 24 5.2 Specification 1...... 26 5.2.1 Step 1 - The effect of uncertainty on forecasting errors.... 26 5.2.2 Step 2 - Uncertainty measures prediction intervals...... 29 5.2.3 Step 3 - Real GDP growth rate prediction intervals...... 42 5.3 Specification 2...... 44 5.4 Comparison with CPB fan chart...... 48

6 Conclusion 49

Appendix 51

2 List of Tables

1 of explanatory variables...... 23 2 of explanatory variables...... 23 3 Regression results: short data (MAE)...... 25 4 Regression results: short data (RMSE)...... 25 5 Explanatory variables correlogram (short data)...... 26 6 Dependent variable: forecast error ( absolute error)...... 27 7 Dependent variable: forecast error (RMSE)...... 27 8 Results from Dickey-Fuller tests...... 32 9 Results from KPSS tests...... 33 10 ...... 36 11 Prediction summary...... 36 12 Dependent variable: forecast error (Mean absolute error)...... 45 13 Dependent variable: forecast error (RMSE)...... 45 14 Prediction summary (specification 2)...... 46

List of Figures

1 of within-country forecast error: before and after 2008...... 7 2 Bank of England Fan Chart for CPI Inflation...... 8 3 Forecast performance score by country (1975-2014)...... 20 4 Forecast performance score by country (2004-2014)...... 21 5 Aggregate measures of forecast error...... 22 6 Summary statistics of forecast error...... 22 7 CPB predictions and fitted aggregate forecast errors (spring, RMSE) 29 8 Raw : oil...... 30 9 Raw time series: esi...... 30 10 Raw time series: news...... 31 11 Raw time series: vox...... 31 12 ACF and PACF for oil...... 34 13 ACF and PACF for news...... 34 14 ACF and PACF for esi...... 35 15 ACF and PACF for vox...... 35

3 16 Selected model: oil...... 37 17 Selected model: news...... 37 18 Selected model: esi...... 38 19 Selected model: vox...... 38 20 Residuals: oil...... 39 21 Residuals: esi...... 40 22 Residuals: news...... 40 23 Residuals: vox...... 41 24 Tests for in the residuals...... 42 25 Fan chart using MAE, December forecasts...... 43 26 Fan chart using RMSE, December forecasts...... 44 27 CPB predictions and fitted aggregate forecast errors (spring, RMSE) - Specification 2...... 46 28 Fan chart using MAE, December forecasts (Specification 2)...... 47 29 Fan chart using RMSE, December forecasts (Specification 2)..... 48 30 CPB fan chart (September 20, 2016)...... 49 31 CPB predictions and fitted aggregate forecast errors, (spring, MAE). 51 32 CPB predictions and fitted aggregate forecast errors, (June, MAE).. 52 33 CPB predictions and fitted aggregate forecast errors, (September, MAE) 52 34 CPB predictions and fitted aggregate forecast errors, (December,MAE) 53 35 CPB predictions and fitted aggregate forecast errors, (June, RMSE). 53 36 CPB predictions and fitted aggregate forecast errors, (September, RMSE)...... 54 37 CPB predictions and fitted aggregate forecast errors, (December, RMSE) 54 38 CPB predictions and fitted aggregate forecast errors, (June,MAE) (Specification 2)...... 55 39 CPB predictions and fitted aggregate forecast errors, (September, MAE) (Specification 2)...... 55 40 CPB predictions and fitted aggregate forecast errors, (December, MAE) (Specification 2)...... 56 41 CPB predictions and fitted aggregate forecast errors, (spring, RMSE) (Specification 2)...... 56 42 CPB predictions and fitted aggregate forecast errors, (June, RMSE) (Specification 2)...... 57

4 43 CPB predictions and fitted aggregate forecast errors, (September, RMSE) (Specification 2)...... 57 44 CPB predictions and fitted aggregate forecast errors, (December, RMSE) (Specification 2)...... 58 45 Fan chart using MAE, Spring forecasts...... 58 46 Fan chart using RMSE, Spring forecasts...... 59 47 Fan chart using MAE, June forecasts...... 59 48 Fan chart using RMSE, June forecasts...... 60 49 Fan chart using MAE, September forecasts...... 60 50 Fan chart using RMSE, September forecasts...... 61 51 CPB predictions and fitted aggregate forecast errors, (June, MAE) (Specification 2)...... 61 52 CPB predictions and fitted aggregate forecast errors, (September, MAE) (Specification 2)...... 62 53 CPB predictions and fitted aggregate forecast errors, (December, MAE) (Specification 2)...... 62 54 CPB predictions and fitted aggregate forecast errors, (spring, RMSE) (Specification 2)...... 63 55 CPB predictions and fitted aggregate forecast errors, (June, RMSE) (Specification 2)...... 63 56 CPB predictions and fitted aggregate forecast errors, (September, RMSE) (Specification 2)...... 64 57 CPB predictions and fitted aggregate forecast errors, (December, RMSE) (Specification 2)...... 64 58 Fan chart using MAE, Spring forecasts (specification 2)...... 65 59 Fan chart using RMSE, Spring forecasts (specification 2)...... 65 60 Fan chart using MAE, June forecasts (specification 2)...... 66 61 Fan chart using RMSE, June forecasts (specification 2)...... 66 62 Fan chart using MAE, September forecasts (specification 2)...... 67 63 Fan chart using RMSE, September forecasts (specification 2)..... 67

5 ;

1 Introduction

Economic forecasting occupies a distinguished position in econometrics, as it pro- vides important information for policy analysis. Policy decisions that take economic forecasts as inputs can have long lasting significant impacts on members of the econ- omy. Therefore, forecasting accuracy is a key target for econometricians. Examples of macroeconomic variables which are commonly forecasted include unemployment rates, price indexes, interest rates, exchange rates, and commodity prices. This pa- per focuses on improving the current methodology which the Centraal Planbureau (CPB) uses for predicting future real GDP growth rate. In particular, I use measures of uncertainty to construct confidence intervals around the central projections. Following the 2008 financial crisis, prediction accuracy has improved in the Netherlands, both compared to past performance and to other countries’ perfor- mance. This can be seen in figure1, where, for each country, the left bar is the standard deviation of forecast errors before 2008 and the right bar is using the years following 2008.1 The forecast accuracy in most of countries has became worse since 2008, the year of the global financial crisis (Greece, Finland, Germany, United King- dom and Denmark), while the forecast accuracy in the Netherlands and Belgium had improved after the crisis. In some countries (Ireland, Luxembourg and Swe- den), while forecasting performance had improved, the forecast errors still remained relatively high. It is important for the CPB to determine whether this improvement is due the employment of better forecasting methods or because of the decline in real GDP volatility. The CPB would like to know whether predictions can be improved by us- ing uncertainty measures to construct dynamic confidence intervals around forecasts, replacing the current static alternative. To this end, it is required to understand which uncertainty measures affect forecasting errors (prediction quality), to which my paper is contributing. The aim of the CPB is to reform the current prediction methodology, in particular, changing the current state of affairs, in which past fore- cast errors are taken as time-invariant measures of uncertainty. It is proposed that by incorporating additional uncertainty measures, a more flexible, time-variant, predic-

1The source for this figure is an unpublished CPB memo.

6 tion intervals can be obtained. This is referred to by the CPB as dynamic confidence intervals.2

Figure 1: Standard deviation of within-country forecast error: before and after 2008

The figure contains the average over time of the within country standard deviation of the forecast error. For each country, the laft bar is the average over years before 2008, the year of the global financial crisis, and the left bar is the average over years for the years after 2008. Since 2008,

Let me elaborate on why evaluating economic uncertainty is a challenging task. Uncertainty can be thought of as consisting of three types: past, present, and future. The uncertainty from the past comes from two sources. Firstly, there is a time lag in getting accurate approximations of recent values of macroeconomic variables and data such as real GDP are revised after some time, as its accurate computation is lagged. Secondly, for some variables, such as real GDP, the true value is never known with certainty as it is only estimated, not accurately computed, in contrast to variables such as inflation, for which after its computation we have a precise index. The problem of the present uncertainty is that policymakers need to make decisions which are highly depending on the current state of the economy, which is unknown or inaccurately estimated.3 Finally, we simply do not know the future accurately. Aiming to make the economy less vulnerable to shocks and to mitigate the impacts

2The terms static and dynamic confidence intervals are coined in the sense of this paper by the CPB. These are not terms that have a precise meaning for econometricians. Thus, the reader can think of the current forecasting methodology employed by policymakers as the static version and the work in this paper as the dynamic version of prediction intervals. 3For example, quarterly data are published forty days after the end of each quarter. This time-lagged information creates the present uncertainty to forecasters.

7 of bad outcomes, the Bank of England has been publishing two-years-ahead forecasts for the inflation rate by introducing a new technique, defined as “Fan Charts” in its Quarterly Inflation Report since 1996. These measures present the of several forecasted macroeconomic variables, namely real GDP growth rate, inflation, unemployment rate and federal budget balance.4 Consider the Bank of England’s fan chart for the CPI in figure2, where the prediction was made in late 2015. The line in the non-shaded area consists of past realizations and the colored area in the shaded area is the fan chart. Lighter color corresponds to wider confidence intervals around the central projection. I am interested in evaluating the methods with which the Bank of England constructs these confidence intervals and propose a better methodology for doing so.

Figure 2: Bank of England Fan Chart for CPI Inflation

This is the CPI fan chart which was constructed by the Bank of England in 2015. The two-year forecasts confidence intervals are presented in the shaded area and spread out from the central projection. The confidence intervals also become wider as the forecasting horizon increases. The red line shows the historical realizations up to 2015.

The method used by the Bank of England is peculiar as it combines an empirical model (see Dowd[2008] for the model specification) with subjective expert advice. That is, there is a model underlying the formation of prediction intervals, however,

4In the literature, inflation is the most common variable for which a fan chart is constructed. The reason is that the inflation rate does not need to be revised, thus reducing past uncertainty. This is in contrast to, for example, real GDP growth rate.

8 its parameters are not estimated by a coherent, reproducible statistical procedure, but rather they are determined by the bank’s members of the Monetary Policy Com- mittee. One can reasonably argue that this semi-scientific approach has at least two advantages. Firstly, the experts from the committee can make predictions based on intangible data which is hard to quantify (e.g. political unrest). Secondly, if, on the other extreme, using only a model whose parameters have been objectively esti- mated, absurd prediction intervals would be advised, the members of the committee can realize this and amend the prediction in a sensible direction. However, there is a strong sense in which the Bank of England’s method cannot be used as a consistent policy tool. Let me argue why a subjective formation of the fan charts’ prediction intervals is problematic. Clearly, the opinion-based nature of this procedure makes it not re- producible, that is, prediction intervals depend on the personnel composition of the committee. Moreover, the supposed benefit of this approach, namely that it allows the incorporation of data which is difficult to quantify, disappears when considering recent developments in data science, particularly, in applying text analysis to quantify uncertainty. In an influential recent paper, Baker et al.[2015] construct a measure of economic uncertainty based on textual analysis of leading newspapers. Specifi- cally, they search for and count the appearance of words which imply uncertainty. Notably, their measure performs well in quantifying uncertainty (for example, the uncertainty measure for the United States grows significantly around major events such as the sub-prime mortgage crisis and the September 11 attacks). I examine this measure, among others, as explanatory variables for constructing prediction intervals as functions of current economic uncertainty. The method employed by the Bank of England has been criticized by empirical evaluation studies. The most important evaluation of the Bank’s past predictions’ performance can be found in Dowd[2008]. 5 It is found that predictions are not performing well, especially for the short run (a horizon of up to one year). The paper which is closest in content to my research is Cornec[2014], who also criticizes the Bank of England’s methodology and proposes a new method. In particular, he uses data from business surveys as a proxy of the economy’s state for making inference about prediction quality.6 I complement this paper by using a wider

5There are many papers that evaluated inflation fan charts, however, this papers, along with Elder[2005] are the only papers which evaluated fan charts for real GDP growth rate. The latter, however, assumed the data are independent, which is questionable. 6In addition to criticizing the Bank of England’s methodology, Cornec[2014] also discusses the

9 of economic indicators, as business surveys alone are insufficient when trying to proxy for uncertainty. The main objective of this paper is to investigate the relation between past fore- cast errors of real GDP growth rate and some economic uncertainty indicators. The CPB can then use the results for creating an improved fan chart for forecasts (e.g. inflation, GDP) based on those objective indices. The explanatory variables, which aim to measure uncertainty, used are stock market volatility, Consensus Forecast data, Economic Sentiment Index, oil price and Economic Policy Uncertainty Index. These exogenous variables are expected to identify high uncertainty regimes. I begin with a specification including Consensus Forecast data, as requested by the CPB. This requirement constrains the data to be notably short and I therefore conclude that it is better to exclude this variable. However, the CPB can continue with this variable for future research when it obtains longer data. In two other specifications I do not include Consensus Forecast data and therefore obtain a longer sample period. As is typical in this type of studies, I do not enjoy an abundance of data. Never- theless, the results can be used as suggestive evidence for starting to construct an alternative method of forming prediction intervals around important macroeconomic variables. The paper proceeds as follows. The methodology will first be presented in section 2. In section 3 I describe the data, its sources and some intuition regarding my variable selection. Summary statistics can be found in section 4. Section 5 presents the empirical results and discusses the inference that can be made from them, and section 6 concludes with a discussion and policy recommendations.

2 Methodology

Let me provide a brief overview of the procedure I apply in this paper. I begin by examining the effects of measures of uncertainty on forecast errors. To do this, I use yearly data to perform regressions of uncertainty measures on aggregate measures of forecast errors. For the point in time at which prediction is made, I use time series models to predict future values of the uncertainty measures. I develop a method to methodology of a National Institute of Statistics and Economic Studies (INSEE), a government institution in France, which is arguably even worse. They use a constant prediction intervals, by simply taking the standard deviation of past forecast errors, regardless of how far in the future the prediction is aiming. I view this as further motivation to improve the existing methodology.

10 use the prediction intervals of these forecasts, along with the estimated coefficients of the regressions mentioned above, for constructing predicted forecast errors for real GDP growth rate. It is constructive to view the method as a three-step procedure described informally as follows:

Step 1 Perform OLS regressions of international uncertainty measures on ag- gregate measures of real GDP growth rate forecast errors.

Step 2 Use ARMA model to create predictions and prediction intervals around future uncertainty measures (out-of-sample prediction).

Step 3 Using the estimated coefficients from step 1 and the prediction intervals from step 2, construct prediction intervals for future forecast errors, which are used to create a fan chart.

I will now elaborate on each step in a formal manner.

2.1 Step 1

The explanatory variables used in the first step are country-invariant. Therefore, a measure of the forecast errors, which is aggregated over the countries in my sample (European countries) is required. This will be the dependent variable. I use two methods for this aggregation, namely mean absolute error (MAE) and root mean squared error (RMSE). Both are natural measures of distance between predictions and realizations. Let yi,t denote the variable for which forecasting is performed from country i and let et+h,i,t denote the forecast error for period t + h when using predictions made at period t (h > 0). That is,

et+h,i,t =y ˆt+h,i,t − yi,t+h

wherey ˆt+h,i,t stands for the forecast for period t + h > t made at period t. The aggregate forecast error measures, which will be used as the dependent variable in the first step of my procedure, are calculated as follows:

n 1 X e¯1 = MAE = |e | t t n t+h,i,t i=1 1 n ! 2 1 X e¯2 = RMSE = e2 . t t n t+h,i,t i=1

11 The explanatory variables capturing uncertainty are collected in Xt =

(Xt,1,...,Xt,K ), along with a constant, in a :

j e¯t = Xtα + ut for j = 1, 2, where α is a vector of estimable parameters and ut is an error term.

Assume exogeneity, which the explanatory variables Xt are uncorrelated with the error term ut. et and ut are scalars, α is of dimension K × 1 and Xt is of dimension 1 × K. In my dataset the full model has K = 4 and I with different specifications, choosing from those explanatory variables. OLS regressions yields estimated coefficients

t t t t αˆ = (ˆα1, αˆ2,..., αˆK ) (1) which I save for step 3. Note that a superscript t is added to the estimated parameters. The reason is not that they are is time-varying (they are not), but rather to stress the fact that these are coefficients obtained by using data available up to period t.

2.2 Step 2

I use traditional time series models for obtaining confidence intervals around forecasts of explanatory variables. To this end, I follow the modeling philosophy advocated by Box and Jenkins[1976] (known as the Box-Jenkins methodology). Their procedure, which is widely used for time series in empirical applications, requires first ensuring stationarity of the series, followed by model selection, estimation of the model and performing model evaluation via diagnostic tests to confirm the chosen model before making predictions. I start by discussing stationarity, the first requirement of a time series for making predictions within this framework. The strongest stationarity is strict stationarity, which means that the distribution of the time series does not change over time.

Formally, a series {yt} is strictly stationary if F (yt1 , . . . , ytr ) = F (yt1+h, . . . , ytr+h) for all h and r. However, this is a very strong condition that is also hard to verify empirically. A weaker version of stationarity, called weak stationarity or stationarity requires that expectations and of {yt} are time-invariant and 7 that the covariance between yt and yt+h do not depend on t for each h. From an

7These are standard definition that can be found in, for example, Hamilton[1994].

12 empirical standpoint, weak stationarity implies that if we plot the series, its values should roughly fluctuate with constant variation around a fixed level. Along with plotting the graphs of the series, I use the Dickey-Fuller test to check if the series is stationary.8 For non-stationary series, I perform a transformation on the data by taking first differences in order to obtain stationary series.9 Testing the transformed series, it is confirmed that this transformation yields stationary series. Using the transformed series, I proceed to the next step in prediction, namely model selection. To identify which model is most suitable (within the class of ARMA models) for representing the series, I examine the value of the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). The goal is to select the ideal model among the candidate models that minimizes the information loss. As the aim of the paper is to develop a method which minimizes the use of subjective human intervention, I perform automated model selection. The information criteria AIC and BIC are useful as they balance goodness of fit against model complexity. That is, these criteria prefer better fit to the data, but have a penalty term which increases in the number of parameters of the model, that is, the number of autoregressive and coefficients. This is also in the spirit of Box and Jenkins[1976], who argue for the importance of using parsimonious models. We cannot choose a model and be certain that no other model would have been better, but by minimizing these information criteria, we can mitigate the trade-off between goodness of fit and model complexity. The information criteria are given by:

p + q + 1 AIC = logσ ˆ2 + 2 (2) T p + q + 1 BIC = logσ ˆ2 + 2 log T (3) T

2 PT 2 whereσ ˆ = t=1 ˆ is the residual sum of squares divided by the period, obtained from estimation of the model (I use maximum likelihood for this, but OLS and GMM can be used as well). Minimization of the first term logσ ˆ2 implies better fit between the model and data. The second term is a penalty term, which increases in the complexity of the model (through the number of autoregressive coefficients p and moving average coefficients q). Clearly, the penalty term in the BIC is larger and therefore it may choose a more parsimonious model.

8The null hypothesis is that a unit root is present, which is evidence for non-stationarity. 9There are other commonly used transformation on time series for achieving stationarity, for example, taking the difference between log and log of the lag and computing the rate of change.

13 Once a model is selected, I evaluate the model by checking if the residuals follow the white noise assumption. Let me recall three different definition of white noise. If a residuals process has mean zero, constant variance and there is no serial correlation, we define it as a weak white noise. If in addition, a residuals process is independently and identically distributed, we define it as strong (independent) white noise. Finally, a residuals process that also follows standard normal distribution is referred to as Gaussian white noise. The reason for making predictions of explanatory variables is that if I would like to obtain confidence intervals for these predictions, for using them in step 3. For being able to construct confidence intervals which are intuitive and simple to compute, I assume normality of the error terms in the ARMA models which are chosen to represent the series. Then we have a well-known formula for the prediction confidence intervals. Let me provide the theory for constructing these confidence intervals.10

If the process {Xtk} is weakly stationary then it has an ARMA(p, q) representa- tion, and can be written as its MA(∞) representation. This result is the well-known Wold decomposition: ∞ X Xt,k = µk + θj,kt−j,k j=0 where

E[Xt,k] = µk ∞ 2 X 2 V ar[Xt,k] = σk θj,k j=0 2 σk = V ar[k] and the t−j,k are error terms. The normality assumption allows us to write a (1−α)% confidence interval for the prediction as

1 " h−1 !# 2 2 X Xˆ ± Z α σ 1 + θ . t+h,k,t 2 j j=1

α where Z α is such that P (Z > Z α ) = . Performing the prediction procedure for each 2 2 2 10These are well-known results which I reiterate here for completion. Two good references which I used are Hamilton[1994] and Ciˇzekˇ [2014].

14 of the explanatory variables yields two vectors of upper and lower bounds, denoted as

ut+h,t = (ut+h,1,t, . . . , ut+h,k,t)

lt+h,t = (lt+h,1,t, . . . , lt+h,k,t) which are saved for using them in step 3. This concludes step 2. Notice that an important assumption which has been made in the discussion of step 2 is that the explanatory variables’ time series are uncorrelated. This assumption justifies making a separate forecast for each of the variables. Under the more realistic assumption that these variables are correlated, multivariate forecasting has to be performed. This is beyond the scope of this paper. Moreover, this has no impact on steps 1 and 3, that is, my methodology can be easily extended to allow for correlation between the time series by changing to a multivariate forecasting method, but steps 1 and 3 would remain unchanged.

2.3 Step 3

In the third step, the outputs of the first two steps are used to plot a fan chart. Specifically, they are used for determining reasonable bounds on predicted forecast errors, that is, constructing prediction intervals. Start by dividing the estimates into two vectors according to their signs:

t t t αˆ+ = {αˆk|αˆk > 0} (4)

t t t αˆ− = {αˆk|αˆk < 0} (5) Further define t t t κ+ = {k|αˆk ∈ αˆ+} (6)

t t t κ− = {k|αˆk ∈ αˆ−}. (7) These are the indexes of the positive and negative coefficients, respectively. Now, I define the lower and upper bounds of the predicted forecast errors as follows:

X t X t Ut+h,t = αˆkut+h,k,t + αˆklt+h,k,t (8) t t k∈κ+ k∈κ−

15 X t X t Lt+h,t = αˆklt+h,k,t + αˆkut+h,k,t. (9) t t k∈κ+ k∈κ−

Under these definitions, Lt+h,t is the lowest possible predicted forecast error and

Ut+h,t is the highest. Notice that Ut+h,t > Lt+h,t is always true as

X t X t (ut+h,k,t − lt+h,k,t) αˆk + (lt+h,k,t − ut+h,k,t) αˆk > 0. t | {z } |{z} t | {z } |{z} k∈κ+ >0 >0 k∈κ− <0 <0

Now I define the prediction intervals for the forecast errors as:

yˆt+h,i,t ± [γLt+h,t + (1 − γ)Ut+h,t] where γ ∈ [0, 1]. (10)

When γ = 1, the isy ˆt+h,i,t ± Lt+h,t, the confident intervals will become the narrowest, which corresponds low tolerance of error in predictions. When

γ = 0, the prediction interval isy ˆt+h,i,t ± Ut+h,t, which corresponds to high tolerance 1 in of error in predictions. If γ = 2 , the prediction intervals will be intermediate. Note that, as in the Bank of England’s fan charts, I allow for varying degree of confidence in the predictions. Using Lt+h,t, the narrowest interval, reflects confidence that the forecast is relatively accurate, whereas using Ut+h,t, the widest interval, reflects higher tolerance of error on part of the forecaster. To create a fan chart, one needs to take these two extremes and several intermediate values. γ is a parameter which proxies error tolerance. As it increases, the prediction interval approaches

Ut+h,t.

3 Data description

In this section I describe the data used for the empirical analysis in the following section. The data were collected from several different sources. I will describe for each of them their sampling time (which varies according to data availability) and a brief description and their source. In what follows, I elaborate on each variable and provide reasons for its potential inclusion. Real GDP growth rate forecast error has been obtained by the CPB from the European Commission.11 This is the dependent variable which I will try to explain using the following explanatory variables. The variable is computed twice every

11These data were used in Marco Fioramanti et al.[2016], a discussion paper which evaluates European forecasting practices.

16 year, corresponding to forecasts made in March about the current year (CY) and in August about one year ahead (YA). These data are available from 1969 until 2014. I am interested only in the year ahead forecasts.

1. Consensus Economics is a private macroeconomic survey firm that publishes a monthly forecast report which provides estimates for several variables such as GDP growth rate, inflation, exchange rates and oil prices over 20 countries. There are almost 30 institutions in each country that contribute to the forecasts for the current year and one year ahead. I selected only the standard deviation of real GDP growth rate among five countries, namely the United States, Ger- many, France, United Kingdom and the Netherlands. Then I took the average of their standard deviations to represent general forecast performance. The CPB has purchased monthly data from Consensus Economics only starting at 1997.

1. Economic Policy Uncertainty Index (news) is the measure developed in Baker et al.[2015]. 12 An index is computed especially for Europe and is based on three components. Firstly, the of terms which relate to policy un- certainty in several newspapers from France, Germany, Italy, Spain and the UK.13 The second component is expected changes in tax laws, in particular the tax provisions which are will terminate within 10 years. The last component is based on experts advice, drawing on professional forecaster’s predictions. The index aggregates these components to construct an overall uncertainty index. The merits of this index, namely how it performs well on past data, is described by Baker et al.[2015] in detail. The sample frequency is monthly, starting from 1987 to 2013. The inclusion of this variable is an obvious choice as I am indeed looking for variables which measure uncertainty. As this index is relatively new, a by-product of my research would be an evaluation of whether this seminal index can be used for improving Central Banks’ prediction abilities.

2. The Economic sentiment index (esi) is constructed as a weighted average of

12This variable can be downloaded for free from www.policyuncertainty.com, a website which the authors of Baker et al.[2015] maintain and update. 13There are also country-specific indexes, also for the Netherlands, but I chose the European version as the dependent variable is an aggregation over European countries. The methodology for constructing the Dutch country-specific economic uncertainty index, based on the methods of Baker et al.[2015] were developed in Kroese et al.[2015].

17 five economic confidence indicators .14 As these are survey data, they represent subjective feelings about the economy’s state. The frequency of the sample is monthly, starting from 1985 to 2013. The data were obtained from the European Commission’s Eurostat database.

3. Stock market volatility (vox) data are obtained from the Chicago Board Options Exchange (CBOE) and are available for end of every trading day since 1986.15 I aggregate volatility is considered to be a good measure of economic uncertainty. Notice that the fact that the data come from the United States, rather than from European financial markets is not a concern because of the high correlation between these measures due to international trade. The precise methodology for constructing this index can be found in Exchange[2003]. For having longer data, I use the index using the old methodology, known as VOX, rather than VIX. This index is often referred to as a ”fear index” since when the stock market index becomes more volatile, it means the economic uncertainty is high, the standard deviation of forecast errors will become also larger.

4. Oil prices (oil) have been obtained from the OECD database for the time pe- riod 1973-2014 and the yearly frequency is used. Oil price is considered as an important economic indicator for various reasons. Oil is the major commodity of the energy market. When the economy is blooming, we expect that the de- mand for oil would increase, leading to an increase in the price of oil. Moreover, oil price is strongly related to politics and international relations, as it is pro- duced by several specific countries, namely the Organization of the Petroleum Exporting Countries (OPEC).16 The supply side of oil is more complicated due to varying political circumstances, export capacities and so on. Therefore, oil price gives us different perspectives of the uncertainty. For example, in times of war, oil price is expected to increase as the world situation is uncertain. The oil price is also influenced by the exchange rates because the price of oil is measured by US Dollars. In this paper, I use oil price to measure the state of the economy instead of standard deviation or percentage changes of oil price

14The composition is industrial confidence indicator (40%), services confidence indicator (30%), consumer confidence indicator (20%), construction confidence indicator (5%) and retail trade con- fidence indicator (5%). A detailed description of the construction of the ESI can be found at http://ec.europa.eu/eurostat/web/products-datasets/-/teibs01. 15The data can be downloaded from http://www.cboe.com/micro/volatility/. 16Algeria, Angola, Ecuador, Gabon, Indonesia, Iran, Iraq, Kuwait, Libya, Nigeria, Qatar, Saudi Arabia, United Arab Emirates and Venezuela.

18 because if the expected event affects the economy for a longer period, the price of oil will remain to be high so that both methods in the standard deviation and percentage changes are not able to reflect the uncertain situation.

Given that the dependent variable is given in yearly frequency, explanatory vari- ables with a different sample frequency have been transformed into yearly series at some points in the analysis. Whenever I use the explanatory variables along with the dependent variable, I use the yearly version of all series (in step 1). On the other hand, when I use the explanatory variables’ time series individually, I use their monthly versions (in step 2). The data for the variables news and esi were monthly, whereas the data for vox was daily. For all variables, whenever their yearly version is used, I took the within-year average to obtain yearly series. Data for oil were already yearly and were therefore left in their original forms. I experimented with adding changes in commodity prices as additional explanatory variables (e.g., gold, silver prices), however, these caused multicollinearity, so I decided to continue without them. The variables esi and oil were obtained from the databases mentioned above, but rather than downloading them from the web and importing into a statistical software, I accessed them using the API of Quandl.com. This way, the data are conveniently imported directly to the statistical software. Various softwares can access the API, particularly R and Python. Another advantage of Quandl.com is that it gathers together many databases and allows the researcher to search all of them at the same time.

4 Summary Statistics

4.1 Dependent Variable

I consider two measures of forecast error, which serve as dependent variables. In particular, I use the mean absolute error (MAE) and root mean square error (RMSE) of forecast errors. Figure5 plots these measures. The explanatory variables I consider are: oil prices, stock market volatility indices, Economic Sentiment Indicator, and the Economic Policy Uncertainty Index (news bases index). I provide information on the distribution of the variables considered in the anal- ysis. To get an idea of the heterogeneity across countries in forecasting quality, I

19 constructed a relative forecast performance score. For each year, I computed an in- dicator function for each country, indicating that its forecast error was lower than the forecast error within that year. For each country, I sum this indicator function across years and divide by the number of years for which we have its data (roughly the number of years a country has been in the European Union). These scores are presented in figure3. As many countries joined the European Union in 2004 and because we would like to get an idea of recent performance, I calculated the same measure using data starting at 2004, a year that saw a significant increase in the number of member states, which is plotted in figure4. Generally, France is performing better than all other European countries in making predictions about real GDP growth rate.

Figure 3: Forecast performance score by country (1975-2014)

The forecast performance score is calculated for each country by dividing the number of years in which that country’s forecast error was smaller than the median divided by the number of years it has been in the sample (determined by when it joined the EU).

20 Figure 4: Forecast performance score by country (2004-2014)

The forecast performance scores in this figure are just as in figure3, but using only the years 2004- 2014. I choose 2004 as a turning point because at this year a large number of countries have joined the EU.

We know that when there is a financial shock, the forecast error will become relatively larger than before the shock. Let me recall some important financial shocks that happened in the last decade and examine the forecast performance in these years. The September 11 attacks in 2001 and the bursting of the dot-com bubble, which started in the same year, may have contributed to the maximum forecast error being higher than in previous years. Between 2005 and 2006, the United States housing bubble caused a local recession that had effects on the world economy and indeed, the forecast performance became worse again during this period. In 2008, when the global financial crisis happened, the one year ahead forecast error increased almost fourth times compared to the previous year and the forecast errors reached their highest value at 15.3 in 2009. We can see some suggestive country specific patterns in the prediction quality. Latvia and Ireland do not perform well while Belgium, Germany, Spain, the United Kingdom, Austria and Poland are doing relative better than other countries. Figure5 plots two of the measures of forecast errors for one year ahead. Both measures capture uncertainty well in the financial crisis as their value peaked. Ob- viously, the two methods are highly correlated. For completeness, figure6 presents summary statistics of the dependent variables in EU-27 countries (whenever avail-

21 able) for one year ahead. As we can see from the figure, the forecast accuracy became poor and forecast range became large when the crisis happened.

Figure 5: Aggregate measures of forecast error

This figure shows the two aggregate measures of forecasts errors (MAE and RMSE) over time. Naturally, there is a very high correlation between the two measures. Observe that at times of economic crisis, such as 1973 and 2008, both measures capture the the increase in forecasting errors.

Figure 6: Summary statistics of forecast error

The mean, standard deviation and range of the dependent variable (aggregated forecast errors) are plotted over time from 1970 to 2014. When there was a financial crisis, the mean and standard deviation increased and the range of forecast errors became wider.

22 4.2 Explanatory Variables

Table1 presents summary statistics of the explanatory variables. The short names are vox (stock market volatility index), news (Economic Policy Uncertainty Index), esi (Economic Sentiment Index) and oil (oil prices). There are 27 observations for each variable as the years for which all variables were available are 1987 to 2013. The esi variable is the most volatile, while the vox is the least. In order to investigate whether there is multicollinearity among the explanatory variables, the correlogram is shown in table2. We see that the correlation between those variables are mostly less than 0.5. However, oil price and esi are positively highly correlated (0.645). It might have multicollinearity and the result of the regressions will be invalid. Vox also has positive correlation with esi and news. Other that this, There is negative correlation between the pairs news-oil, news-esi and vox-oil. The negative correlation between news and esi is somewhat surprising, because both consumers and the news based index are influenced by newspapers.

Table 1: Summary statistics of explanatory variables count mean std min 25% 50% 75% max oil 27.0 43.266 35.648 12.803 18.146 24.461 63.364 126.225 news 27.0 110.729 39.139 60.323 87.054 97.890 124.818 209.986 esi 27.0 100.619 9.363 78.000 95.150 101.383 107.771 116.358 vox 27.0 21.096 6.614 12.153 15.415 21.603 25.690 34.691

The table presents summary statistics of the independent variables in their yearly format from 1987 to 2013. The standard deviation shows that the variables oil and news are the most volatile, while vox is the least volatile.

Table 2: Correlogram of explanatory variables oil news esi vox oil 1.000 0.750 -0.240 -0.052 news 0.750 1.000 -0.465 0.024 esi -0.240 -0.465 1.000 0.063 vox -0.052 0.024 0.063 1.000

The correlogram is useful for initial detection of strong linear relationships exist between the ex- planatory variables.

23 5 Empirical results

This section presents the the empirical results from applying the methodology de- scribed above using several sets of explanatory variables. I start by presenting the results from applying step 1 to the data which includes the Consensus Forecast data. These are merely presented for completeness and because the CPB is interested in this variable’s inclusion. As the data are too short for making meaningful conclu- sions, I present the full analysis for two other specifications, such that the longer data have been obtained.

5.1 Short dataset: using data from Consensus Economics

Tables3 and4 present the regression results from step 1 results for the specification which includes data from Consensus Economics, along with all other explanatory variables. We can see that most variables are not individually significant, even at moderate significant levels. The time span of our data is particularly short (only eighteen observations), certainly for estimating five parameters. I experimented with using Consensus Forecast data along with a smaller subset of the explanatory vari- ables, however, the results did not improve. In particular, I used a specification without the variable esi, as it is highly correlated with Consensus Forecast data, as can be seen in table5. I conclude that this variable should not be used unless it is possible to obtain a longer time series.17 The next specifications I consider improve on this point, as they are both more parsimonious in terms of how many explanatory variables are included and use more data. 17Note that as these are sold by a private company, significant costs may be involved.

24 Table 3: Regression results: short data (MAE)

No. Observations: 18 AIC: 53.4794 Df Model: 5 BIC: 58.8216 Df Residuals: 12 Log-Likelihood: -20.740 R-squared: 0.474 F-: 4.964 Adj. R-squared: 0.338 Prob (F-statistic): 0.0108 Coef. Std.Err. t P > |t| [0.05 0.95] const 2.917 4.909 0.594 0.563 -5.832 11.666 consensus 7.485 2.964 2.525 0.027 2.202 12.767 vox 0.010 0.034 0.284 0.781 -0.051 0.071 news -0.006 0.007 -0.816 0.430 -0.018 0.007 esi -0.041 0.038 -1.092 0.296 -0.108 0.026 oil 0.027 0.089 0.302 0.768 -0.132 0.185

These are the OLS regression results when using MAE as the dependent variable. This is the short version of the data because it contains the variable consensus, which constrains the dataset to be short. All variables are individually insignificant. That is even though the F -statistic implies joint significance, which could be due to multicollinearity

Table 4: Regression results: short data (RMSE)

No. Observations: 18 AIC: 57.0888 Df Model: 5 BIC: 62.4311 Df Residuals: 12 Log-Likelihood: -22.544 R-squared: 0.491 F-statistic: 5.355 Adj. R-squared: 0.362 Prob (F-statistic): 0.00812 Coef. Std.Err. t P > |t| [0.05 0.95] const 2.613 5.426 0.482 0.639 -7.058 12.285 consensus 8.168 3.277 2.493 0.028 2.328 14.007 vox 0.021 0.038 0.554 0.590 -0.046 0.088 news -0.008 0.008 -1.090 0.297 -0.022 0.005 esi -0.039 0.042 -0.933 0.369 -0.113 0.035 oil 0.077 0.098 0.784 0.448 -0.098 0.252

These are the OLS regression results when using RMSE as the dependent variable. This is the short version of the data because it contains the variable consensus, which constrains the dataset to be short. The results are qualitatively similar to those in table3.

25 Table 5: Explanatory variables correlogram (short data) consensus smkt news esi oil consensus 1.000 -0.005 -0.084 -0.601 0.455 smkt -0.005 1.000 0.366 -0.065 -0.177 news -0.084 0.366 1.000 -0.296 0.339 esi -0.601 -0.065 -0.296 1.000 -0.541 oil 0.455 -0.177 0.339 -0.541 1.000

In shorter dataset, the consensus variable is highly correlated to esi and oil.

5.2 Specification 1

I will now report the results from applying the three steps procedure to my main specification, as mentioned in the methodology section, for creating the fan chart, the object of interest from the CPB’s standpoint.

5.2.1 Step 1 - The effect of uncertainty on forecasting errors

Let me begin by discussing the expected signs of the estimated coefficients. I expect the coefficients on news and vox to be positive in the sense that uncertainty posi- tively affects the forecast errors. The coefficient on esi is expected to be negative as consumers are usually fearful in times of economic turmoil, and in times like these I expect the forecast errors to be large. The effect of oil prices on the forecast error is ambiguous. I have no ex-ante reason to expect any direction. We can reason that when there is a war, oil prices increase and the world is in an uncertain state, and the forecast error is expected to be large. But this needs not be the case, as the oil prices are determined in a large part by OPEC and politics plays an important role. The regression results when using MAE and RMSE as the dependent variable are presented in tables6, and7, respectively. The first observation is that all variables are, jointly and individually, statistically significant regardless of which dependent variable is used. Oil price change has a positively significant effect on forecast errors, even though the intuition behind it is not clear. The esi variable is also significant, moreover, the effect of the forecast error is negative, consistent with what I expected about consumers’ subjective feelings about the economy. Moving on to the next variable, news, observing its estimated coefficient shows a negative effect on forecast errors. This effect of news on forecast errors contrasts the expected sign. Recalling that this variable is created by the selected keywords in newspapers about politics

26 and economy issues, intuitively, the uncertainty keywords will appear more often on newspapers when an unexpected event happened, consistent with the larger forecast errors, yet the empirical results say another story. For the stock market volatility variable, vox, the sign of the effect on forecast errors is as expected. Indeed, when the stock market become volatile, the state of the economy is uncertain. Therefore, forecast error is expected to be large. Table 6: Dependent variable: forecast error (Mean absolute error)

No. Observations: 27 AIC: 72.5822 Df Model: 4 BIC: 79.0614 Df Residuals: 22 Log-Likelihood: -31.291 R-squared: 0.525 F-statistic: 6.090 Adj. R-squared: 0.439 Prob (F-statistic): 0.00186 Coef. Std.Err. t P > |t| [0.05 0.95] const 9.588 2.443 3.925 0.001 5.393 13.783 oil 0.016 0.007 2.264 0.034 0.004 0.029 news -0.018 0.007 -2.519 0.020 -0.031 -0.006 esi -0.084 0.021 -4.038 0.001 -0.119 -0.048 vox 0.074 0.026 2.891 0.008 0.030 0.118

These regression results are from using the longer data set (dropping consensus). The results improve as we see that all variables are individually significant, along with them being jointly significant. Moreover, the data fit has improved, as demonstrated by the Adjusted R2. However, the sign on news is negative, which is not as expected.

Table 7: Dependent variable: forecast error (RMSE)

No. Observations: 27 AIC: 77.0040 Df Model: 4 BIC: 83.4832 Df Residuals: 22 Log-Likelihood: -33.502 R-squared: 0.565 F-statistic: 7.149 Adj. R-squared: 0.486 Prob (F-statistic): 0.000758 Coef. Std.Err. t P > |t| [0.05 0.95] const 10.460 2.651 3.945 0.001 5.907 15.013 oil 0.021 0.008 2.712 0.013 0.008 0.035 news -0.022 0.008 -2.808 0.010 -0.036 -0.009 esi -0.091 0.022 -4.040 0.001 -0.129 -0.052 vox 0.095 0.028 3.433 0.002 0.048 0.143

These are the regression results when using RMSE as the dependent variable. Results are qualita- tively similar to those in table6.

27 In table6 and7, we can see that all the explanatory variables are individually and jointly significant at the 5% confidence level in the specification 1. It can be the results of both obtaining longer dataset and dropping one variable: the Consensus such that each variable can gain more information from observations to improve its estimates. Let me stop here and make an important, subtle point before continuing. The data used in step 1 are European data, whereas the interest of the paper is in the predictions and realizations of real GDP growth rate made by the CPB for the Netherlands. These predictions and realizations are the ones used in step 3. Let me provide partial evidence about the method being applicable even given this data mismatch. In figure7 I plot the realizations and predictions made by the CPB. Around the predictions, I plot the forecast errors implied by the model from step 1 by obtaining the fitted values from the regression, around the predictions from the CPB. We see that in almost all years, realizations are within the implied forecast errors, which confirms that using in step 1 for making prediction intervals on country-specific macroeconomic variables is a reasonable approximation. This is indicated by the percent correct number in the figure: in 92,31% of the years, an interval constructed by the fitted values from step 1 around the CPB prediction includes the realizations.18 The plot in figure7 is using the fitted values from the RMSE specification, along with the forecasts made by the CPB in spring. Similar plots, but using the MAE specification, along with forecasts made by the CPB at other point in time (June, September and December) can be found in the appendix and generally lead to the same conclusion.

18The reason for not using Dutch data in step 1 is a technical data limitation, namely that the forecast data I have obtained from the CPB data are only starting 2000, which would imply a data time span which is too short for any estimation. Surely, it is advisable to use a longer version of these data if it is available. Correspondingly, one can use country-specific explanatory variables, such as the country-specific news variable (which has recently been made available for the Netherlands) and local stock market volatility. This is left for future research.

28 Figure 7: CPB predictions and fitted aggregate forecast errors (spring, RMSE)

This figure shows that the data I used from the step 1 (European aggregate data) are representative enough to to be used for country-specific predictions (in the Netherlands only). The plot contains the CPB forecast for real GDP growth in the Netherlands, and around it I plot the prediction interval (the shaded area) implied by taking the fitted values from step 1. In 92.31% of the years, the realization fell within this implied prediction interval. This is some evidence about the fact that using aggregate European data to make prediction about one country in Europe is potentially justified. This figure used RMSE and the spring CPB forecast. Similar figures using other forecasts and the MAE dependent variable are in the appendix.

5.2.2 Step 2 - Uncertainty measures prediction intervals

Moving on to step 2, the predictions of explanatory variables, I begin with model selection for each of the variables. The goal is to use data up until 2013 to make predictions about 2014 and 2015. The time series are plotted, along with summary statistics, in figures8, 10,9 and 11.

29 Figure 8: Raw time series: oil

This figure contains the time series and summary statistics for the variable oil. Clearly, the mean of the series is not constant over time.

Figure 9: Raw time series: esi

This figure is the time series and summary statistics for the variable esi. Both the mean and standard deviation appear to be quite stable over time.

30 Figure 10: Raw time series: news

This figure is the time series and summary statistics for the variable news. The mean seems to increase over time.

Figure 11: Raw time series: vox

This figure is the time series and summary statistics for the variable vox. The volatility seems to be time dependent, which is standard for financial time series.

As described in the methodology section, I proceed applying the Box-Jenkins approach, starting with checking for stationarity. Table8 Presents the results from applying the Dickey-Fuller test to each series. Examining the resulting p-value, we see that the null hypotheses that esi and vox are unit root processes is rejected at very low significance levels. For news, the null hypothesis is rejected at the 5% significance level. The series oil is not stationary.

31 Table 8: Results from Dickey-Fuller tests oil news esi vox Test Statistic 2.037 -2.871 -4.653 -4.419 p-value 0.999 0.049 0.000 0.000 #Lags Used 2.000 5.000 3.000 2.000 Number of Observations 41.000 318.000 376.000 333.000 Critical Value (1%) -3.601 -3.451 -3.448 -3.450 Critical Value (5%) -2.935 -2.871 -2.869 -2.870 Critical Value (10%) -2.606 -2.572 -2.571 -2.571

The Dickey Fuller test, for which the null hypothesis is that the series has a unit root, has been performed on each of the explanatory variable’s time series. Automatic lag choice was used. The variables esi and vox seem to be stationary, a is seen in the low p-value of their test. For the variable news, this test also concludes stationarity if we use the 5% significance level. For oil, the null hypothesis is not rejected and it is concluded that this series has a unit root.

The Dickey-Fuller test is known to have the problem that it rejects the null hypothesis too often, that is, it is prone to type I error. This would lead to an incorrect conclusion that a series is stationary. With this in mind, I perform another stationarity test, namely the KPSS (Kwiatkowski-Phillips-Schmidt-Shin) developed in Kwiatkowski et al.[1992]. In this test, as opposed to the Dickey-Fuller test, the null hypothesis is that the series is stationary. Therefore, not rejecting the null hypothesis constitutes evidence of stationarity (whereas in the Dickey-Fuller test not rejecting the null hypothesis is evidence against stationarity). The results from the KPSS test are presented in table9. If we use the 5% significance level, all series appear to be non-stationary, as is seen in the p-values. However, if we use a significance level of 1% or 2%, the results fully agree with the Dickey-Fuller results from table8. To ensure stationarity, I apply a first-difference transformation to all variables. The intuition is that for oil and news, both tests imply non-stationarity. Furthermore, for esi and vox, both test imply stationarity only at very particular significance levels (around 1-2%). With the transformed stationary series I proceed to model selection.

32 Table 9: Results from KPSS tests oil news esi vox Test Statistic 0.394 0.403 0.151 0.184 p-value <0.01 <0.01 0.046 0.021 #Lags Used 1.000 4.000 4.000 4.000 Number of Observations 41.000 318.000 376.000 333.000 Critical Value (1%) 0.216 0.216 0.216 0.216 Critical Value (5%) 0.146 0.146 0.146 0.146 Critical Value (10%) 0.119 0.119 0.119 0.119

The KPSS test has been performed on each of the time series. For each variable, the null hypothesis is that the series is stationary. There is no closed form for the distribution of the KPSS test statistic and the critical values used are those from the Monte Carlo simulations performed by Kwiatkowski et al.[1992]. At the 5% significance level, there is evidence that all series are non-stationary, whereas with a lower significance level of, for example, 1%, there is evidence that oil and news are non-stationary while esi and vox are stationary.

While the plots for each autocorrelation function (ACF) and partial autocorre- lation function (PACF) can be seen in figures 12, 13, 14 and 15, the criterion I use for model selection is BIC minimization instead of making conclusions based on the ACF/PACF plots. Table 10 presents the chosen ARMA(p, q) models by minimization of AIC and BIC. I estimated models with p and q as large as 5 and computed the AIC and BIC for each of them. The table reports the model for which each criterion is minimized. The penalty term is greater for the AIC than for the BIC, which can be seen in my results as the two criteria chose the same model, except for esi where the BIC chose a smaller model. I decided to accept the models chosen by the BIC.

33 Figure 12: ACF and PACF for oil

Figure 13: ACF and PACF for news

34 Figure 14: ACF and PACF for esi

Figure 15: ACF and PACF for vox

35 Table 10: Model selection vox esi oil news AIC (2, 3) (1, 3) (0, 1) (4, 1) BIC (1, 2) (1, 3) (0, 1) (1, 1)

This table presents the selected models according to minimization the Akaike and the Bayesian information criteria. Within the class of ARMA(p, q) models, I varied p and q, where their largest values was 5 (keeping in mind that the model should be parsimonious). For p, q, the model is estimated using maximum likelihood and then the AIC and BIC were computed. For the purpose of the analysis below, I chose to take the model implied by minimization of BIC, which either gave the same model as AIC or a more parsimonious model, because its penalty for model size is larger.

Using the selected model for each of the explanatory variables, predictions are performed as described in the methodology and results are presented in table 11. The columns are the predicted value, its and 90% confidence inter- vals around it. As expected, the confidence intervals become wider when making predictions for two years ahead.

Table 11: Prediction summary Value Std.Err 90% CI oil14 128.873 9.92 [112.557, 145.189] oil15 131.735 13.538 [109.467, 154.004] esi14 97.801 8.921 [83.127, 112.475] esi15 97.613 8.921 [82.939, 112.287] news14 190.489 21.574 [155.002, 225.975] news15 195.148 30.056 [145.71, 244.587] vox14 18.432 4.517 [11.002, 25.862] vox15 18.457 6.435 [7.873, 29.041]

The prediction summary shows the prediction values of each explanatory variable in 2014 and 2015, performed in step 2. As expected, forecasts made for period more far in the future exhibit larger standard errors and wider prediction intervals.

Let me evaluate the models which were chosen by BIC minimization. Figures 16, 17, 18 and 19 present the stationary versions of the original series, along with the fitted values implied by the chosen model. Except for oil, the variable for which a very parsimonious model, MA(1), has been chosen, there do not appear to be extreme anomalies between fitted models and original series.

36 Figure 16: Selected model: oil

Figure 17: Selected model: news

37 Figure 18: Selected model: esi

Figure 19: Selected model: vox

Usage of these series for performing can be done under two assumptions on the residuals, namely normality and no autocorrelation. I begin by examining the nor- mality assumption. The residuals are plotted in figures 20, 21, 22 and 23, along with a normal distribution with mean zero and variance set equal to the variance of the residuals. For each series I ran two tests of normality: the Jarque-Bera test (a parametric test) and the KolmogorovSmirnov test (a nonparametric test). In both

38 tests, the residuals are distributed normally under the null hypothesis. In all cases the null is rejected with p-values all less than 0.001, implying evidence against the conjecture of the residuals being normally distributed. However, these tests do not perform well in small samples. In particular, they may lead to high rates of type I error, that is, falsely concluding evidence against the residuals being distributed normally. I decide to assume normality for proceeding and argue that some evidence can be seen in the graphs. That is, I do observe a shape that resembles a normal distribution. With a small sample in hand, and statistical tests that do not perform well in small samples, explicitly assuming normality is required.

Figure 20: Residuals: oil

39 Figure 21: Residuals: esi

Figure 22: Residuals: news

40 Figure 23: Residuals: vox

I now proceed to examine the dependence between residuals, evaluating the white noise assumption. To this end, I perform a Ljung-Box test, in which the null hypoth- esis is that there is no autocorrelation. The test uses for different lags, that is autocorrelations of different orders. Figure 24 presents the p-values from these tests for each of the series, for up to 20 lags. It can clearly be seen that for all series except for news, the null hypothesis is not rejected at standard significance levels. For the news series, there may be rejection when considering 3 and 4 lags, but no first or second order autocorrelation. Overall, I take these as evidence for the lack of autocorrelation in the error terms.

41 Figure 24: Tests for autocorrelation in the residuals

The figure plots p-values from the Ljungg-Box test for each series. For each series, the test is performed for various lag lengths, corresponding to tests about autocorrelation of different orders. High p-values are evidence for no autocorrelation, as this is the null hypothesis. For oil, esi and vox, the plot shows evidence of no autocorrelation of any order. For news there is evidence of potential autocorrelation of orders 3, 4 and 5.

5.2.3 Step 3 - Real GDP growth rate prediction intervals

Now I am ready to present the final result, the main interest of the paper, namely fan charts for real GDP growth rate. Figures 25 and 26 plot the fan charts implied by the above results, using step 3 as it is described in the methodology section.19 For periods before and including 2013, The plotted line consists of realizations, as I simulate the scenario in which a policymaker wants to make a prediction in 2013 about 2014. The value for the line in 2014 is the forecast made by the CPB in December 2013 about 2014. The bright square depicts the realization for 2014. The darkest shade of gray are the prediction intervals obtained by using lt+h,t, whereas the lightest shade of gray is obtained by using ut+h,t, The intermediate shade of gray is obtained by using a linear combination of lt+h,t and ut+h,t with equal weights.

19I also used the other three forecasts made by the CPB and the results are qualitatively the same, so I do not repeat them here. These fan charts can be found in the appendix.

42 Figure 25: Fan chart using MAE, December forecasts

This figure presents the fan chart for the growth rate of real GDP in the Netherlands using pre- dictions made in 2013 about 2014 and 2015, based on the methodology developed in this paper. It is based on the CPB December forecasts, using MAE as the measure for aggregated cross-country forecasts error. The plotted line represents the realizations (up to 2013) and the predictions (for 2014 and 2015 which were made by the CPB in December). The bright squares are the realizations. The narrowest intervals are in the darkest area, corresponding to the lowest tolerance while the lightest area features the largest degree of tolerance. In both figures, we see that the realizations in 2014 and 2015 are located in the darkest area (that is, the narrowest possible prediction interval).

43 Figure 26: Fan chart using RMSE, December forecasts

This figure presents a fan chart using RMSE, and otherwise it is constructed just as figure ??. The two figures are qualitatively similar, implying that policymakers should not be too concerned about which of these two aggregations of forecast errors should be used.‘

From examining the fan chart, one drawback is immediately clear: even the most narrow prediction intervals seem to be quite wide. For example, in figure 26 the most narrow prediction intervals are from roughly −0.5 up to 2. Having said that, it may be the case practicing policymakers are relatively conservative and would in fact want the most narrow intervals to be of sizable width. Certainly, the widest intervals are huge and therefore meaningless. But this is to be expected of the widest intervals: exaggerated conservatism leads to predictions which are not useful. I will now move on to present the results from using the same methodology under a different specification, examining how the fan charts are affected.

5.3 Specification 2

I examined all possible models involving the subsets of the four variables oil, esi, news and vox. The only specification, except for the full model analyzed above, which seemed to give reasonable results includes only the variables esi and vox. Other specifications led to bad fit, as seen in their adjusted R2 and for some of them also individual and joint insignificance. In specification 2, I select the variables esi and vox and present the results from applying step 1 in tables 12 and 13. The coefficients

44 are jointly and individually significant with the expected signs. Table 14 shows the prediction summary. The prediction method is the same like before. Figure 27 is the CPB predictions and fitted aggregate forecast errors by using only esi and vox estimates results.

Table 12: Dependent variable: forecast error (Mean absolute error)

No. Observations: 29 AIC: 79.4041 Df Model: 2 BIC: 83.5060 Df Residuals: 26 Log-Likelihood: -36.702 R-squared: 0.381 F-statistic: 8.013 Adj. R-squared: 0.334 Prob (F-statistic): 0.00195 Coef. Std.Err. t P > |t| [0.05 0.95] const 6.347 1.966 3.228 0.003 2.994 9.701 esi -0.063 0.019 -3.292 0.003 -0.095 -0.030 vox 0.064 0.026 2.462 0.021 0.020 0.109

This table presents the regression results from including a smaller set of explanatory variables, which is motivated by the fact that policymakers typically do not enjoy an abundance of observations. I experimented with various subsets of the variables in specification 1, but only this specification seemed reasonable in terms of individual significance, joint significance and model fit. Both esi and vox have the expected sign. While this model is more parsimoneous, it also provides worse fit than that of specification 1, as is seen in the lower adjusted R2.

Table 13: Dependent variable: forecast error (RMSE)

No. Observations: 29 AIC: 86.1838 Df Model: 2 BIC: 90.2857 Df Residuals: 26 Log-Likelihood: -40.092 R-squared: 0.392 F-statistic: 8.382 Adj. R-squared: 0.345 Prob (F-statistic): 0.00155 Coef. Std.Err. t P > |t| [0.05 0.95] const 6.754 2.210 3.056 0.005 2.985 10.523 esi -0.067 0.021 -3.120 0.004 -0.103 -0.030 vox 0.083 0.029 2.825 0.009 0.033 0.133

This table used RMSE as the dependent variable and is qualitatively the same as table 12.

45 Table 14: Prediction summary (specification 2) Value Std.Err 90% CI oil14 NaN NaN NaN oil15 NaN NaN NaN esi14 97.801 8.921 [83.127, 112.475] esi15 97.613 8.921 [82.939, 112.287] news14 NaN NaN NaN news15 NaN NaN NaN vox14 18.432 4.517 [11.002, 25.862] vox15 18.457 6.435 [7.873, 29.041]

The prediction summary shows predicted values of the variable esi and vox for 2014 and 2015 when using specification 2. As expected, standard errors are larger and perdition intervals are wider for 2015, compared to 2014, when predictions are made in 2013.

Figure 27: CPB predictions and fitted aggregate forecast errors (spring, RMSE) - Specification 2

In order to confirm the aggregate European data can be used for country-specific analysis (the Netherlands only), I examine the realizations and prediction intervals which are the fitted values from the regression in table 13. These fitted prediction intervals are plotted around the spring CPB projections. In 71.43 % of the years, the realization was within the fitted prediction interval.

Figures 28 and 29 present the fan charts the result from using specification 2. Let me compare these with the fan chart that were found using the full model. The first

46 observation is that all prediction intervals in specification 2 are more narrow than the prediction intervals in the full model fan chart. Notice that for both 2014 and 2015, the realization is not included in the most narrow interval, whereas this did hold for the full model. Therefore, full model does not allow for very confident predictions whereas specification 2 does.

Figure 28: Fan chart using MAE, December forecasts (Specification 2)

The method of constructing the fan chart in specification 2 are the same like before. However, the realizations in 2014 and 2015 are in the moderate area instead of lying in the darkest area in specification 1. Note that prediction intervals are smaller than in specification 1 (compared to figure 25).

47 Figure 29: Fan chart using RMSE, December forecasts (Specification 2)

This figure presents a fan chart using RMSE, and otherwise it is constructed just as figure ??. As in specification 1, the two figures are qualitatively similar.

5.4 Comparison with CPB fan chart

The CPB recently published a series of fan charts for several macroeconomic variables, namely real GDP growth, HICP inflation, unemployment and general government financial balance. I would like to briefly compare the results from my paper with those of the CPB, as these feature two different methods of creating fan charts. The real GDP growth rate fan charts for 2016 and 2017, based on predictions made at 2015, are presented in figure 30. The intervals are 30%, 60% and 90% confidence intervals around the predictions.20 The most narrow interval presented, the 30% interval, covers roughly one percent- age point, whereas in the full model my fan chart had more than 2 percentage points and roughly 0.5 percentage point in specification 2. The fan charts presented by the CPB are not accounting for uncertainty measures, as my method does. Therefore, it is expected to obtain larger intervals, as is the case for the full model and not speci- fication 2. Since the full model matches what the CPB is looking for in the sense of obtaining wider intervals which account for uncertainty, along with the fact that in step 1 it has better fit to the data (as seen in the higher adjusted R2), I advise the

20These fan charts were published several days before I was writing these words. All fan charts can be found at https://www.cpb.nl/en/article/fan-charts-september-2016.

48 CPB to use the full model for creating reformed fan charts.

Figure 30: CPB fan chart (September 20, 2016)

6 Conclusion

In this paper I evaluated the method used by the Bank of England for constructing prediction intervals for macroeconomic variables’ forecasts and I have developed an alternative method which achieves the same goal. The main argument against the Bank of England’s fan charts, namely their subjectivity, which makes them not re- producible, is not present in my method. My technique is also far less costly, as it does not require well-paid experts to meet and put forward their assessments of the future. My goal is for the method put forward in this paper to be further devel- oped and used by the CPB and other governmental agencies when making dynamic macroeconomic predictions by taking uncertainty into account. There are at least three disadvantages in my method. Firstly, the forecasting prediction intervals are not stated with a corresponding degree of confidence, or, equivalently a significance level. Secondly, the widest intervals were found to be too wide in my specification. Finally, as was expected, the prediction intervals seem to be

49 quite sensitive to the model specification from step 1 because if we assume different explanatory variables, it will lead to different fan charts. They are also sensitive to the time series models used to perform predictions for the explanatory variables in step 2. When we compare my method to the traditional fan charts, one should keep these in mind, along with the main benefit of my method: lack of subjectivity and reproducibility As both methods have advantages and disadvantages, I believe that a thorough empirical comparison between the two methods would be fruitful. Let me mention several potential directions for future research. Firstly, it is pos- sible to assume different more complex econometric models in step 1, rather than simple . Furthermore, one can try adding other uncertainty mea- sures as explanatory variables. To this end, it would be helpful to use data which are sampled more frequently than yearly. As the predictions are usually made in yearly terms, one may consider performing interpolation on them to get more fre- quent data, to go along with the uncertainty measures, whose frequency is at least monthly. Secondly, it would seem reasonable to use only country-specific forecast errors as the dependent variable (instead of aggregation over countries as was done in this paper). For example, if the CPB would like to use my method, I would advise to obtain a long time series of historical forecast errors in the Netherlands. Thirdly, one can provide a criticism of the method I introduced in this paper, along with a comparison of its performance, compared to the performance of the Bank of Eng- land’s fan charts. Lastly, part 2 of the methodology can be improved by performing multivariate forecasting of the explanatory variables, removing the no correlation assumption. I will conclude with some policy recommendations. It has been seen in this paper that there is potential for using uncertainty indicators for constructing confidence intervals around macroeconomic forecasts. If this method is adopted, it is recom- mended to phase it in gradually, perhaps using both this methodology and the old one for several years for evaluation purposes. I suggest that the variables esi, news, vox and oil, as they were defined in the data section, be used as explanatory vari- ables for examining the effect of uncertainty on forecast errors in step 1, rather than any other subset of these variables. These fit the data reasonably well and the re- sulting fan chart has wider prediction intervals than fan chart made using the old methodology, as they account for uncertainty.

50 Appendix

In this appendix, I show the figures in which I apply the same method as in the body of the paper, but with different measures and CPB forecasts. For example, in figures 31, 32, 33 and 34, I apply MAE measure by using different forecast periods that the CPB made every year (spring, June, September and December) to examine how suitable that aggregate data is for usage in analyzing predictions in the Netherlands. Figures 35, 36 and 37 are using RMSE measure. The results show that the percentage of years in which realizations fall within the fitted prediction interval from step 1 around the CPB prediction remains very high (nearly 100%). This is also applied to specification 2 (from figure 51 to figure 57) Figures 45 up to 50 are fan charts which apply the full model using both MAE and RMSE with different forecast periods that the CPB made every year. Between figure 58 and 63 are the fan charts which apply the specification 2 model.

Figure 31: CPB predictions and fitted aggregate forecast errors, (spring, MAE)

51 Figure 32: CPB predictions and fitted aggregate forecast errors, (June, MAE)

Figure 33: CPB predictions and fitted aggregate forecast errors, (September, MAE)

52 Figure 34: CPB predictions and fitted aggregate forecast errors, (December,MAE)

Figure 35: CPB predictions and fitted aggregate forecast errors, (June, RMSE)

53 Figure 36: CPB predictions and fitted aggregate forecast errors, (September, RMSE)

Figure 37: CPB predictions and fitted aggregate forecast errors, (December, RMSE)

54 Figure 38: CPB predictions and fitted aggregate forecast errors, (June,MAE) (Spec- ification 2)

Figure 39: CPB predictions and fitted aggregate forecast errors, (September, MAE) (Specification 2)

55 Figure 40: CPB predictions and fitted aggregate forecast errors, (December, MAE) (Specification 2)

Figure 41: CPB predictions and fitted aggregate forecast errors, (spring, RMSE) (Specification 2)

56 Figure 42: CPB predictions and fitted aggregate forecast errors, (June, RMSE) (Specification 2)

Figure 43: CPB predictions and fitted aggregate forecast errors, (September, RMSE) (Specification 2)

57 Figure 44: CPB predictions and fitted aggregate forecast errors, (December, RMSE) (Specification 2)

Figure 45: Fan chart using MAE, Spring forecasts

58 Figure 46: Fan chart using RMSE, Spring forecasts

Figure 47: Fan chart using MAE, June forecasts

59 Figure 48: Fan chart using RMSE, June forecasts

Figure 49: Fan chart using MAE, September forecasts

60 Figure 50: Fan chart using RMSE, September forecasts

Figure 51: CPB predictions and fitted aggregate forecast errors, (June, MAE) (Specification 2)

61 Figure 52: CPB predictions and fitted aggregate forecast errors, (September, MAE) (Specification 2)

Figure 53: CPB predictions and fitted aggregate forecast errors, (December, MAE) (Specification 2)

62 Figure 54: CPB predictions and fitted aggregate forecast errors, (spring, RMSE) (Specification 2)

Figure 55: CPB predictions and fitted aggregate forecast errors, (June, RMSE) (Specification 2)

63 Figure 56: CPB predictions and fitted aggregate forecast errors, (September, RMSE) (Specification 2)

Figure 57: CPB predictions and fitted aggregate forecast errors, (December, RMSE) (Specification 2)

64 Figure 58: Fan chart using MAE, Spring forecasts (specification 2)

Figure 59: Fan chart using RMSE, Spring forecasts (specification 2)

65 Figure 60: Fan chart using MAE, June forecasts (specification 2)

Figure 61: Fan chart using RMSE, June forecasts (specification 2)

66 Figure 62: Fan chart using MAE, September forecasts (specification 2)

Figure 63: Fan chart using RMSE, September forecasts (specification 2)

67 References

Scott R. Baker, Nicholas Bloom, and Steven J. Davis. Measuring economic policy un- certainty. Working Paper 21633, National Bureau of Economic Research, October 2015. URL http://www.nber.org/papers/w21633.

George EP Box and Gwilym M Jenkins. Time series analysis, control, and forecasting. San Francisco, CA: Holden Day, 3226(3228):10, 1976.

Matthieu Cornec. Constructing a conditional gdp fan chart with an application to french business survey data. OECD Journal: Journal of Business Cycle Measure- ment and Analysis, 2013(2):109–127, 2014.

Kevin Dowd. The gdp fan charts: an empirical evaluation. National Institute Eco- nomic Review, 203(1):59–67, 2008.

Robert Elder. Assessing the mpc’s fan charts. Bank of England Quarterly Bulletin, Autumn, 2005.

Chicago Board Options Exchange. Vix white paper. URL: http://www. cboe. com/micro/vix/vixwhite. pdf, 2003.

James Douglas Hamilton. Time series analysis, volume 2. Princeton university press Princeton, 1994.

L. Kroese, S. Kok, and J. Parlevliet. Beleidsonzekerheid in nederland. Economisch Statistische Berichten, 4715:464467, 2015.

Denis Kwiatkowski, Peter CB Phillips, Peter Schmidt, and Yongcheol Shin. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? Journal of econometrics, 54 (1-3):159–178, 1992.

ISTAT Marco Fioramanti, Laura Gonz´alezCabanillas, Bjorn Roelstraete, Salvador Adrian Ferrandis Vallterra, et al. European commission’s forecasts accuracy revis- ited: Statistical properties and possible causes of forecast errors. Technical report, Directorate General Economic and Financial Affairs (DG ECFIN), European Com- mission, 2016.

Pavel Ciˇzek.Econometricsˇ 3 lecture notes: time series, Spring 2014.

68