<<

Forecasting with VARs with Long- and Short-Run Restrictions: A Monte-Carlo Study∗

Farshid Vahid† Osmani Teixeira de Carvalho Guill´en School of Banco Central do Brasil Australian National University and Crisp Building 026 Graduate School of Economics — EPGE ACT 0200 Brazil Australia [email protected] Jo˜ao Victor Issler Graduate School of Economics — EPGE Getulio Vargas Foundation Praia de Botafogo 190 s. 1111 Rio de Janeiro, RJ 22253-900 Brazil [email protected]

Preliminary version: July, 2004

Abstract Despite the commonly held belief that aggregate display short-run comove- ment, there has been little discussion about the econometric consequences of this feature of the data. We use exhaustive Monte-Carlo simulations to investigate the importance of restrictions implied by common-cyclical features for estimates and fore- casts based on cointegrated vector autoregressive models. First, we show that the “best” empirical model developed without common cycle restrictions need not nest the “best” model developed with those restrictions. This is due to possible differences in the lag-lengths chosen by criteria for the two alternative models. Second, we show that the costs of ignoring common cyclical features in vector au- toregressive modelling can be high, both in terms of forecast accuracy Third, we find that the Hannan-Quinn and the Scwarz criteria perform well among model selection criteria in simultaneously selecting the lag-length and rank of cointegrated vector au- toregressions.

Keywords: Reduced rank models, model selection criteria, forecasting, de- composition.

JEL Classification: C32, C53.

∗Acknowledgments: PartsosthispaperwerewrittenwhileOsmaniT.Guill´en was visiting Monash University, which hospitality is gratefully acknowledged. Jo˜ao Victor Issler and Osmani T. Guill´en ac- knowledges the support of CNPq-Brazil and PRONEX, and CAPES, respectively. †Corresponding author.

1 1Introduction

Although vector autoregressive (VAR) and vector error-correction (VEC) models have become the “working horses” of macroeconometric studies, one of their shortcomings is the excessive number of parameters relative to the average sample size one is usually forced to work with. For example, when dealing with post-war quarterly data, and a VAR with three variables and eight lags, there are seventy five parameters to be estimated from about two hundred data points on each variable. One possible way of reducing the parameter space is to impose restrictions usually coming from , i.e., Engle and Granger(1987), Johansen(1991), and from common cycles, i.e., Engle and Kozicki(1993), Vahid and Engle(1993, 1997) and Hecq, Palm and Urbain(2004). The former are restriction on the long-run behavior of and the latter are restrictions on their short-run behavior. The effects of imposing them in VAR models can be quite different. For example, Engle and Yoo(1987) show that imposing cointegration restrictions helps VAR forecasting accuracy only in long horizons, while Vahid and Issler(2002) show that serial-correlation- common-feature (SCCF) restrictions improves forecasting accuracy at all horizons. A well known result in is that unbiased restricted models produce better forecasts. Although imposing valid restrictions can indeed improve forecasting accuracy, researchers are usually aware of the fact that there is always the risk of imposing false re- strictions, which calls for conservative behavior: it is presumably better to use a possibly inefficient model instead of risking using an inconsistent model in the search for parsi- mony. Recent Monte-Carlo results in Vahid and Issler challenge this view with respect to VAR models with common cycles. They argue that the cost of ignoring common-cycle re- strictions is more than just the efficiency loss. This happens because the usual practice in applied work of choosing lag length by information criteria will severely underparameterize in this case. While common cycles imply a reduced-rank structure for coefficient matrices it implies no restrictions on the number of lags of a VAR. In this case, standard criteria find too small a lag length for VARs with common-cycle restrictions simply because this is the only possible way available to achieve parsimony. As a results, full-rank models chosen by information criteria are frequently misspecified and for such misspecified models one cannot tell from theory what the consequences of incorporating rank restrictions will be. In this paper we argue that SCCF restrictions should be taken seriously in VAR mod- elling, especially when these models are used for forecasting purposes. Our suggestion is based on a wide-scale Monte-Carlo exercise where data is generated by VARs containing simultaneously short-run and long-run restrictions: SCCF and cointegration restrictions respectively. As will become clear in the next sections, the Monte-Carlo design is far from trivial in this case, due to the fact that these two types of rank restrictions have to hold in every data generating process. In our simulations we cover three important issues on model building, estimation, and forecasting. First, we examine the behavior of standard information criteria in choosing lag length for cointegrated VARs with SCCF restrictions. Second, we provide a comparison of forecasting accuracy of fitted VARs when only cointegration restrictions are imposed and when cointegration and SCCF restrictions are jointly imposed. Finally, as a by- product of the latter exercise, we propose a new estimation algorithm where short- and long-run restrictions interact to estimate jointly the cointegrating and the cofeature spaces respectively. This algorithm follows closely the idea of weak-form reduced-rank structure,

2 suggested by Hecq, Palm and Urbain, where the reduced-rank structure of the lagged coefficient matrices in the vector error-correction model (VECM) is different from that of the adjustment coefficient matrix. In the first pass of our algorithm, only weak-form SCCF restrictions are imposed without imposing a priori any restrictions in cointegrating rank. Based on first pass estimates, the long-run impact matrix is estimated and short- and long-run restrictions interact to yield final VECM estimates. At the end, it is possible to conduct inference on cointegrating rank for the system. The focus of the simulations is the accuracy of multi-step ahead out-of-sample forecasts. We design them so that the results would be relevant for an applied macroeconomist estimating a relatively large number of parameters using a limited number of data points. To that end, we consider a variety of Data Generating Processes (DGPs) and sample sizes, that are similar to the “typical” data sets that applied researchers encounter in practice. The current study extends the work of Vahid and Issler(2002) in two dimensions. First, the unrestricted model being considered here is a cointegrated VAR, i.e., a VECM, whereas in Vahid and Issler the unrestricted model was just a VAR with no cointegration restric- tions. As will become apparent below, selecting a large number data genrating processes (DGPs) for VARs with long- and short-run restrictions is far from trivial, and being able to do it makes the results of our Monte-Carlo exercise more robust. Morevover, consid- ering cointegration restrictions is also more realistic because cointegration is a common occurrence in macroeconomic models and data. Therefore, our current study not only adds to what is currently known about SCCF restrictions but it also adds relevant infor- mation which has a wider application. Second, we propose a new estimation algorithm, where short- and long-run restrictions interact to estimate jointly the cointegrating and the cofeature spaces respectively. because of its performance in small samples, we believe that it has the potential to form the basis of a future estimators for VECM coefficients in the presence of SCCF restrictions. The results of our simulation exercise and those of previous studies on SCCF restric- tions in VAR analysis call for a change in focus in the literature on forecasting with these models. While in the past there has been a considerable effort examining the importance of cointegration restrictions in VAR models — see, among others, Engle and Yoo(1987), Clements and Hendry(1995), and Lin and Tsay(1996) — there has been little work examin- ing the importance of common-cyclical features in VAR analysis. However, the simulation results in Vahid and Issler and the enriched environment considered here show a superior forecasting performance for models incorporating SCCF restrictions, which happens irre- spective of whether cointegration is or is not present in the data and of the forecasting horizon under study. The outline of the paper is as follows. Section 2 states the reduced-rank restrictions that common cyclical fluctuations impose on the parameters of cointegrated VAR models, and presents the model-selection criteria for reduced-rank models; see also the discussion in the Appendix on these issues. Section 3 describes our Monte-Carlo design; see also the discussion in the Appendix on DGP selection. Section 4 presents the simulation results. Section 5 presents the main conclusions of the paper, as well as a suggestion for further .

3 2 Forecasting VECMs with SCCF restrictions

As in most applied macroeconomic research, we assume that the objective is to build a model for the growth rate of a vector of n economic variables. We denote the levels of these variables at time t by Yt, their logarithms by yt, and their growth rates (i.e. the first difference of the logarithm of Yt)by∆yt. We make the reasonable assumption that ∆yt is stationary, add the simplifying assumption that ∆yt has mean zero (without any loss of generality), and start with the Wold representation of ∆yt,i.e.

∆yt = C (L) εt, (1)

j where C (L)= j∞=0 CjL is a matrix polynomial in the lag operator and C0 = In.From the work of Beveridge and Nelson (1981) and Stock and Watson (1988), it is possible P to decompose the log-level series yt into common trends and cycles (we refer to this as the Beveridge-Nelson-Stock-Watson — or BNSW — decomposition). Using the identity j C (L)=C (1) + ∆C∗ (L), with C∗ (L)= j∞=0 Cj∗L , ignoring the initial value y0,and integrating both sides of (1) we get1: P t yt = C (1) εj + C∗ (L) εt j=1 X = τ t + ct, (2) t where τ t = C (1) j=1 εj and ct = C∗ (L) εt represent the trend and cyclical components of yt respectively. In the BNSW decomposition, the n variables in yt are decomposed into n random-walk componentsP (stochastic trends) and n stationary components (stochastic cycles). If C (1) has rank n q (q>0), the stochastic trends in yt can be characterized − as linear combinations of only n q common random walks, in which case yt is said to be cointegrated, or to have common− stochastic trends, with q linearly independent cointegrating vectors (see Engle and Granger, 1987). If C∗ (L)hasrankr (r

yt = A1yt 1 + ... + Ap+1∆yt p 1 + εt. (3) − − − Under cointegration, the appropriate model for ∆yt in this case will be a VECM, i.e.,

∆yt = A1∗∆yt 1 + ... + Ap∗∆yt p + γα0yt 1 + εt − − − ∆ yt 1 . − . = A1∗ ... Ap∗ γ ⎡ ⎤ + εt ∆ yt p ⎢ − ⎥ £ ¤ ⎢ α0yt 1 ⎥ ⎢ − ⎥ = Φxt + εt, ⎣ ⎦ (4) 1See Stock and Watson (1988) or Vahid and Engle (1993) for more details.

4 p+1 where γ and α are full-rank matrices of order n q,withγα0 = I i=1 Ai, Aj∗ = p+1 × − A , Φ = A ... A γ and x = ∆ y , , ∆ y , (α y ) 0. i=j+1 i 1∗ p∗ t t0 1 t0 p 0 Pt 1 0 − It is important now to discuss what are the restrictions− ··· that− either cointegration− or £ ¤ £ ¤ common-cyclicalP features impose on (2), (3), and (21) when forecasting is concerned; see Vahid and Engle for a complete discussion on parameter restrictions for these represen- tations. If there is cointegration among the elements of yt, C (1) has rank n q (q>0), and the stochastic trends in (2) can be characterized as linear combinations of− only n q common random walks, and with q linearly independent cointegrating vectors stacked− in α0 entering (21); see Engle and Granger(1987). Cointegration imposes restrictions on the behavior of multi-step ahead forecasts as discussed in Engle and Yoo(1987); see also Bev- eridge and Nelson for a complete picture. Suppose that we are interested in Etyt+h.We can write, t t+h yt+h = C (1) εj + C (1) εj (5) j=1 j=t+1 X X +C0∗εt+h + C1∗εt+h 1 + + Ch∗εt + Ch∗+1εt 1 + , − ··· − ··· where the first line in (5) is related to the trend component and the second to the cyclical component. It is straightforward to show that the forecast of the trend component obeys: t t Etτ t+h = C (1) εj =limC (1) εj = τ t, (6) h j=1 →∞ j=1 X X whereas that of the cyclical component obeys:

Etct+h = Ch∗εt + Ch∗+1εt 1 + , and, − ··· lim Etct+h =limCh∗εt + Ch∗+1εt 1 + =0. (7) h h¡ − ¢ ··· →∞ →∞ Therefore, ¡ ¢

Etyt+h = τ t + Ch∗εt + Ch∗+1εt 1 + , − ··· lim Etyt+h = τ t, and, h ¡ ¢ →∞ lim Etα0yt+h =0. (8) h →∞ The second line in (8) shows that the random walk trend τ t is the long horizon forecast of the series yt. Moreover, cointegration will constrain only the limiting forecasting behavior of the level series yt, since the last line in (8) shows that forecasts are colinear in the infinite horizon because α0C (1) = 0. However, one important feature of cointegration is that is has no impact whatsoever on forecasts of the cyclical component.

If there are r common stochastic cycles in yt,thenC∗ (L) in (2) has rank r,andthe n n (p + q)matrixΦ must have rank r (

e 5 because α0Ci∗ = 0, for all i. A similar argument applies to the autoregressive representa- tion, e Et∆yt+h = Φxt, and,

Etα0∆yt+h =0, (10) because α Φ = 0. Results in (9) and (10) show that forecasts of the first differences of y 0 e t are colinear at every horizon h, not only in the limit. Of course, this is related to the fact that SCCFe imply that the forecasts of the cyclical component of yt are colinear at every horizon h as well:

Etct+h = Ch∗εt + Ch∗+1εt 1 + , and, − ··· Etα0ct+h =0¡. ¢ Therefore, in terms of the forecasts of the series y , we get: e t

Etyt+h = τ t + Ch∗εt + Ch∗+1εt 1 + , − ··· Etα0yt+h = α0τ t.¡ ¢ (11) where the last line uses the fact that the forecasts of the cyclical component are colinear e e at every horizon. The key observation from the analysis above is the fact that cointegration only gener- ates colinear forecasts in the limit, i.e., as the horizon gets infinitely large, whereas SCCF restrictions generates colinear forecasts at every horizon. Since we know that the vari- ance of forecasts gets hopelessly large as the horizon increases, time-series models are only helpful at short horizons. But at these horizons, the only type of restriction that could be of any help are SCCF restrictions, not cointegration restrictions.

2.1 Model selection criteria for reduced-rank models When dealing with penitentially cointegrated VARs, a usual procedure in applied work entails the following steps:

1. Using standard information criteria (AIC, HQ or SC), the lag length of the VAR in levels is chosen for subsequent cointegration analysis.

2. Using the lag length chosen in step 1 above, cointegration tests are performed. Most of the time the full-information maximum likelihood method proposed by Jo- hansen(1989, 1991) is used.

3. Conditional on the results of cointegration analysis, a final VECM is estimated and multi-step ahead forecasts are computed.

In this case, there is no need to compute canonical correlations and system estimates can be estimated using OLS, equation by equation. It is obvious that if the model chosen in step 1 is misspecified, inference and forecasting using it will suffer from identical problems. Of course, is samples are large enough, we should expect that consistent information criteria will work properly in choosing lag length. However, when samples are small and we

6 consider VAR models with SCCF restrictions, Vahid and Issler(2002) showed that standard information criteria have a tendency to choose too small a lag length. This is a potential source of misspecification in step 1 in our context, since we will consider cointegrated VARs with SCCF restrictions. Therefore, for these models, our firstexercisewillbeto examine the performance of standard informationcriteriainchoosinglaglength.Based on the optimal choice in step 1, we then examine the performance of the above procedure in choosing the number of cointegration relationships using Johansen’s cointegration test. Finally, we compute multi-step ahead forecasts with the chosen cointegrated VAR, storing the results to be compared with an alternative procedure described below. In the context of VARs with SCCF restrictions, Vahid and Issler suggested an alter- native method for choosing lag length, which consists of simultaneous choice of lag order p and the number of common cycles r (i.e. the rank of Φ), by minimizing one of the following model selection criteria2,

n 2 AIC(p, r)= ln (1 λi (p)) + r (np + n r) (12) − T × × − i=n r+1 X− n 2lnlnT HQ(p, r)= ln (1 λi (p)) + r (np + n r) (13) − T × × − i=n r+1 − Xn ln T SC (p, r)= ln (1 λi (p)) + r (np + n r) , (14) − T × × − i=n r+1 X− where n is the dimension of the (number of series in the) system, r is the rank of VEC model, p is the number of lagged differences in the VECM, T is the number of observations, and λi are the sample squared canonical correlations between ∆yt and the set of regressors 3 xt0 .Inthecasethatq = 0, Vahid and Issler reported a substantial reduction of the frequency in which misspecified models (lag length too short) were chosen. Moreover, forecasting performance was also enhanced. Because of these results, we propose to use this method to choose lag length and rank order of VECM. We now discuss the procedure used to choose the cointegrating rank of the final models. For choice of cointegrating rank a typology of SCCF restrictions suggested by Hecq, Palm and Urbain(2004) is useful. They suggest that there are two types of nested SCCF restrictions. The first is the so-called strong-form SCCF, in which,

α0A∗ =0,i=1, 2, p, and α0γ =0, i ··· where cofeature vectors annihilate the short-run dynamic matrices A and the adjustment e e i coefficient matrix γ as well. The second is the so-called weak-form SCCF, in which,

α0A∗ =0,i=1, 2, p, i ··· where cofeature vectors annihilate only the short-run dynamic matrices A .Obviously, e i∗ weak-form SCCF restrictions are a necessary condition for strong-form SCCF restrictions.

2When the variables are not cointegrated q = 0, and these criteria are the same as those suggested in L¨utkepohl(1993, p. 202). 3Vahid and Engle (1993) showed that r q, which implies that given q, models of rank smaller than q need not be considered. ≥

7 Moreover, choice of rank r for all matrices A , i =1, 2, p,willaffect the estimates of i∗ ··· the cointegrating vectors in α0 and of the adjustment coefficient matrix γ. This happens because there is a one-to-one relationship between matrices Ai∗ and matrices Ai in (21) and in (3), respectively, and because γα0 = (I A1 Ap+1). The usual practice in applied involving− cointegrated− − ··· VARs− with SCCF restrictions is to test first for cointegration and then, conditional on cointegration-test results, test for SCCF restrictions. The algorithm we propose here does not follow that route. It does not impose any a priori constraint on the cointegating space, but imposes weak-form SCCF restrictions. However, at the end of the procedure, the cointegrating rank is estimated using standard tests. A full description of the algorithm is:

1. First, take the optimal pair (p, r) chosen by information criteria (12), (13), or (14), based on the squared canonical correlations between µt and νt,whereµt is the residual of the regression of ∆yt in yt 1 and νt stacks the residuals of the lagged − ∆yt’s on yt 1. Notice that no constraints on the cointegration rank is imposed at this stage. − 2. Using the optimal pair (p, r)chosenin1above,first compute the coefficient matrices

Ai∗, i =1, 2, p,labelledAi∗, i =1, 2, p. Notice that these estimated matrices will have rank···r. ··· c 3. Next, compute the residuals from the regression of ∆yt on A1∗∆yt 1 + ...+ A∗p∆yt p , − − 0 ³ ´ labelled ut and the residuals from the regression of yt 1 on A1∗∆yt 1 + ...+ A∗p∆yt p , − c − c − 1 0 1 labelled ut . Using the between between³ ut and ut compute an ´ c c estimate of γα0,labelledΠ. 4. Using Π from step 3 above, compute the squared canonical correlations between the b residual of the regression of ∆yt in Πyt 1 and of the lagged ∆yt’s on Πyt 1. b − − 5. Go to step 2 and obtain a new set of estimates for the coefficient matrices A , b b i∗ i =1, 2, p,labelledAi∗, i =1, 2, p. Notice that these estimated matrices will have rank···r. ··· c 6. Compare Ai∗ with Ai∗ using some tolerance criterium and keep iterating until conver- gence, repeating steps 4 and 5. After convergence, compute the eigenvalues of Π∗, c the last roundc estimatec of γα0. Using them, determine the number of cointegrating vectors q using Johansen’s method (trace test at 5% significance).

At the end, this algorithm produces an optimal triplet (p, q, r)whichcanbeusedfor estimation and forecasting of a VECM with SCCF restrictions. Its forecasting performance can be compared with that of a regular VECM obtained from the first procedure described in this section, where SCCF restrictions are ignored in every modelling stage.

3 Monte-Carlo design

To make the presentation manageable, we propose using as DGP a three-dimensional VAR, i.e., n = 3. Models that consider the real side of the are often three-dimensional.

8 For example, King et al (1991) estimate a VAR including output, consumption, and invest- ment in order to test the real-business-cycle model of King, Plosser and Rebelo (1988). Issler and Ferreira (1998) use a VAR in output, labor, and capital inputs to estimate long-run elasticities of the aggregate production function. The first parameter we set in the Monte-Carlo design is the lag length p of the VECM. Notice that the lag length of the VAR in levels is p + 1 in this case. We set p =2.It allows either under- or over-parameterization of the VAR model:

yt = A1yt 1 + A2yt 2 + A3yt 3 + ²t. (15) − − − Next, we set the number of cointegrating vectors to one, i.e., q = 1, and the number of cofeature vectors was set equal to two, i.e., n r =2,orr = 1. The values of the cointegrating vector and of the cofeature vector were− set as:

1.0 1.00.1 α = 0.2 and α = 0.01.0 . (16) ⎡ 1.0 ⎤ ⎡ 0.5 0.5 ⎤ − − ⎣ ⎦ e ⎣ ⎦ Conditional on these values, we then choose the number of free parameters remaining in the coefficient matrices A1, A2,andA3 in order to keep all the eigenvalues of companion matrix of the VECM inside the unit circle. Appendix B contains a detailed discussion of the final choice of these free parameters. As becomes obvious from reading it, selecting a large variety of VARs satisfying (16), with eigenvalues of the companion matrix inside the unit circle, is far from trivial. But this is a necessary condition to have credible Monte-Carlo results. The final number of DGPs in which q =1,andr = 2, satisfying (16), with eigenvalues of the companion matrix inside the unit circle, was set equal to 100. For each of these 100 DGPs (choices of Ai’s), we generated one thousand samples of yt’s, by random series ²t’s. Each of these 1000 samples had 1000 observations. However, in all cases, to reduce the impact of initial values on simulated series, we only used the last 100 or 200 observations in running regressions. As is discussed in Vahid and Issler(2002) it is worth sorting results by signal-to-noise ratio (or system R2 measures). Here, we select two different set of parameters with the following characteristics: the firsthasthemedian of the system R2 measure between 0.4 and 0.5, with 3% larger than 0.6 and none greater than 0.7. The second has the of the system R2 is between 0.7 and 0.8, with 22% larger than 0.8 and none greater than 0.9. The Monte-Carlo procedure can be summarized as follows. Using each of our 100 DGPs, we generated 1000 samples (once with 100, and again with 200 observations). Then, we recorded the lag length chosen by traditional (full-rank) information criteria, labelled IC (p): AIC(p),HQ(p)andSC (p), obtained by setting r = 3 in (12)-(14), and the corresponding lag length chosen by alternative information criteria, labelled IC (p, r): AIC(p, r),HQ(p, r)andSC (p, r), in (12)-(14). For choices made using IC (p) we used Johansen’s(1989, 1991) trace test at 5% to choose q and then estimated a VECM with no SCCF restrictions. Their out-of-sample forecasting accuracy measures were recorded up to 16 periods ahead. For choices made using IC (p, r), we used the algorithm described in detail in the last section to obtain an optimal triplet (p, q, r) in each case. Then, a VECM with SCCF restrictions was estimated

9 for each of them and their respective out-of-sample forecasting accuracy measures were recorded up to 16 periods ahead.

3.1 Measuring forecast accuracy Appropriate evaluation of forecasts depends on the specific use that the forecasts are needed for, i.e., the “” of the user. The fact that we have applied economists as our target audience does not suggest that we should evaluate the forecasts of alternative models in any specific way. A macroeconomist who models the growth rate of income, consumption and investment, might in fact be interested in the growth rates of income, savings and investment, or she might be interested in forecasting the levels, based on the growth rates. Therefore, it is important to evaluate the forecasting performance of different models on the basis of measures that are invariant to linear transformation of forecasts, at one horizon, or across different horizons. One measure that satisfies this invariance property is the generalized forecast error second (GFESM ) introduced by Clements and Hendry (1993). GFESM is the determinant of the expected value of the outer product of the vector of stacked forecast errors of all future times up to the horizon of interest. For example, if forecasts up to h quarters ahead are of interest, this measure will be:

˜εt+1 ˜εt+1 0 ˜εt+2 ˜εt+2 GF ESM = ¯E ⎛ ⎞ ⎛ ⎞ ¯ ¯ . . ¯ ¯ . . ¯ ¯ ⎜ ⎟ ⎜ ⎟ ¯ ¯ ⎜ ˜εt+h ⎟ ⎜ ˜εt+h ⎟ ¯ ¯ ⎜ ⎟ ⎜ ⎟ ¯ ¯ ⎝ ⎠ ⎝ ⎠ ¯ where ˜ε is the n-dimensional forecast¯ error at horizon h of¯ our n-variable model. It t+h ¯ ¯ is obvious that this measure is invariant to elementary operations that involve different variables, and also to elementary operationsthatinvolvethesamevariableatdifferent horizons. In our Monte-Carlo, the above expectation is evaluated for every model, by averaging over the simulations. We also consider two popular measures of forecasting accuracy. The first is the deter- minant of the mean squared forecast error matrix at different horizons ( MSFE ), and the second is the trace of the mean squared forecast error matrix (TMSFE).| The determinant| of the MSFE is invariant to elementary operations on the forecasts of different variables at a single horizon, but it is not invariant to elementary operations on the forecasts across different horizons. The trace of the mean squared forecast error matrix is not invariant to either of these transformations. There is one complication associated with simulating 100 different DGPs. Simple av- eraging across different DGPs is not appropriate, because the forecast errors of different DGPs do not have identical variance-covariance matrices. L¨utkepohl (1985) normalizes the forecast errors by their true variance-covariance matrix in each case to get i.i.d. ob- servations. Unfortunately, this would be a very time consuming procedure for a measure like GFESM, which involves stacked errors over many horizons. Instead, for each informa- tion criterion, we calculate the percentage change in forecasting measures, comparing the full-rank models selected by IC (p) , with the reduced-rank models chosen by IC (p, r). This procedure is done at every iteration for every DGP, and the final results are then averaged.

10 4 Monte-Carlo simulation results

4.1 Selection of lag and rank order and number of cointegrating vectors Table 1 shows the frequency of choice of pairs (p, q)donebyIC (p) followed by Jo- hansens’s(1989, 1991) test in 1000 simulations of 100 trivariate VARs(3) with rank one, and low system R2 measure. Table 2 shows the analogous results when a high system R2 measure is considered. The following conclusions emerge:

1. The total frequency in which the true lag length 3 is selected (adding up row 3 for all three IC (p)) varies very little with respect to the system R2. Although the true lag length and number of cointegrating vectors of the DGP is (3, 1), it is never the modeinTable1,andisrarelythemodeinTable2.Despitethat,thefrequencyin which (3, 1) is selected rises in both directions (sample size and R2 measure).

2. The AIC (p) criterium has a slight tendency to overestimate the number of lags. The opposite happens for the HQ(p)andtheSC (p), although the tendency is very strong in both cases. This result is very similar the the one reported in Vahid and Issler(2002). In both tables, AIC (p) chooses the correct (p, q) pairs more often than the other two criteria, with HQ(p) in second place. The modal choice of the SC (p) criterium is a VAR(2) even with 200 observations or with a high system R2 measure.

Table 3 shows the frequency of lag-rank-cointegrating vectors (p, r, q) selection in 1000 simulations of 100 trivariate VARs(3) with rank one, one cointegrating vector and low system R2 measure. There are two steps in this selection: (i) chose the number of lags and rank simultaneously by AIC (p, r), HQ(p, r)andSC (p, r); and (ii) use this information to perform Johansen’s(1989, 1991) cointegrating test to select the number of cointegrating vectors after applying the algorithm described above. Table 4 shows the analogous results when a high system R2 measure is considered. The following conclusions emerge:

1. The total frequency in which the true lag length (2 lags in differences) is selected (adding up row 2 for all three IC (p, r)) varies very little with respect to the system R2. The for lag length is always the correct one — 2 lags in differences — except for SC (p, r) with sample size 100.

2. The AIC (p, r) criterium has a slight tendency to overestimate the number of lags. The opposite happens for the HQ(p, r)andtheSC (p, r), although the tendency is very strong in both cases. This result is very similar the the one reported in Vahid and Issler(2002). In both tables, HQ(p, r) chooses the correct (p, q, r)tripletsmore often than the other two criteria, with SC (p, r) in second place. For high system R2 measures, the modal choice of HQ(p, r) criterium is the correct one, and its frequency value is very high.

Overall, the results in Tables 1 to 4 confirm that, when data has SCCF restrictions, ignoring them has a high cost in terms of model selection. Moreover, selecting lag and rank simultaneously — using IC (p, r) — greatly improves model-selection performance. The next question to answer is whether or not this superior performance in model selection translates into improvements in forecasting accuracy.

11 4.2 Forecasts This section compares the forecasting behavior of the two strategies used so far for model selection. The first is the procedure widely used in practice to select the lag order by standard information criteria IC (p), later testing for cointegation using Johansen’s(1989, 1991) test. The second takes into account these restrictions in a new way. In a first step, information criteria IC (p, r) are used to choose lag and rank order simultaneously. Next, weak-form SCCF restrictions on VECM coefficient matrices are imposed if they are present as suggested by IC (p, r). Finally, after using a new algorithm to estimate the long-run impact matrix Π, the cointegrating rank is estimated. Needless to say that the first procedure ignores completely the possible existence of SCCF restrictions, whereas the second one doesn’t. Table 5 exhibits the percentage improvement of the second procedure over the first for different forecasting measures when we consider systems with low R2 measures. Table 6 shows the analogous results for systems with high R2 measures. For each horizon, the criteria that provided the best forecast performance according to TMSFE is indicated by a bold number in the TMSFE column, and the criteria that provided the worst forecast performance according to TMSFE is indicated by a underline number. We look first at the results with low system R2 measure and the following conclusions emerge:

1. Overall, there is a relatively large forecasting improvement for small horizons, some of them reverting as the horizon increases. The results for systems with high R2 measures are impressive. 2. The HQ criterium produces the best forecasting models when we use a small sample size and low R2. This results is similar to the one obtained in Vahid and Issler(2002). Whenthesamplesizeislarge,SC performs best.

5 Conclusion

This paper argues that in multivariate macroeconometric modelling, the stylized fact that “macroeconomic aggregates move together over the business cycle” should be taken seri- ously. Time series macroeconometric models provide useful forecasts for short horizons (1 to 8 periods). It compares the forecasting behavior of the two strategies used so far for model selection. The first is the procedure widely used in practice to select the lag order by standard information criteria IC (p), later testing for cointegation using Johansen’s(1989, 1991) test. The second takes into account these restrictions in a new way. In a first step, information criteria IC (p, r) are used to choose lag and rank order simultaneously. Next, weak-form SCCF restrictions on VECM coefficient matrices are imposed if they are present as suggested by IC (p, r). Finally, after using a new algorithm to estimate the long-run impact matrix Π, the cointegrating rank is estimated. Needless to say that the first procedure ignores completely the possible existence of SCCF restrictions, whereas the second one doesn’t. Our conclusion is that, overall, there is a relatively large forecasting improvement for small horizons when SCCF restrictions are accounted for. The results for systems with high R2 measures are really impressive. These and previous results on SCCF restrictions in VAR analysis call for a change in focus in the literature on forecasting with these models.

12 While in the past there has been a considerable effort examining the importance of cointe- gration restrictions in VAR models — see, among others, Engle and Yoo(1987), Clements and Hendry(1995), and Lin and Tsay(1996) — there has been little work examining the importance of common-cyclical features in VAR analysis. However, simulation results in Vahid and Issler and the enriched environment considered here show a superior forecast- ing performance for models incorporating SCCF restrictions, which happens irrespective of whether cointegration is or is not present in the data and of the forecasting horizon under study. Finally, it should be stressed that the message of this paper is that short-run restric- tions are likely to be more important than cointegrating restrictions for forecasting at the business-cyclehorizons.Here,wehaveonlyconsidered common-cycle restrictions because of their important macroeconomic implications. We leave the investigation of possible gains resulting form other restrictions, such as block exogeneity restrictions, codependence and other types of rank restrictions, for future research.

Referˆencias

[1] Ahn, S.K. and G.C. Reinsel (1988), “Nested reduced-rank autoregressive models for multiple time series”, Journal of the American Statistical Association, 83, 849-856.

[2] Beveridge, S. and C.R. Nelson (1981), “A New Approach to Decomposition of Eco- nomic Time Series into a Permanent and Transitory Components with Particular Attention to Measurement of the “Business Cycle”, Journal of Monetary Economics, vol. 7, pp. 151-174.

[3] Carlino, G. and K. Sill (1998), “Common trends and common cycles in regional per- capita incomes”, Working Paper, Federal Reserve Bank of Philadelphia.

[4] Clements, M.P. and D.F. Hendry (1993), “On the limitations of comparing mean squared forecast errors,” Journal of Forecasting, 12, 617-637 (with discussions).

[5] Clements, M.P. and D.F. Hendry (1995), “Forecasting in cointegrated systems”, Jour- nal of Applied Econometrics, 10, 127-146.

[6] Engle, R.F. and C.W.J. Granger (1987), “Cointegration and error correction: Repre- sentation, estimation and testing”, Econometrica, 55, 251-276.

[7] Engle, R.F. and J.V. Issler (1995), “Estimating common sectoral cycles”, Journal of Monetary Economics, 35, 83-113.

[8] Engle, R.F. and S. Yoo (1987), “Forecasting and testing in cointegrated systems”, Journal of Econometrics, 35, 143-159.

[9] Granger, C.W.J., M.L. King and H. White (1995), “Comments on testing economic theories and the use of model selection criteria”, Journal of Econometrics, 67, 173- 187.

[10] Hamilton, J.D. (1994), Time Series Analysis, Princeton University Press.

13 [11] Issler, J.V. and F. Vahid (2001), “Common cycles and the importance of transitory shocks to macroeconomic aggregates,” forthcoming in the Journal of Monetary Eco- nomics.

[12] Issler, J.V. and P.C. Ferreira (1998), “Time-Series properties and empirical evidence of growth and infrastructure,” The Brazilian Review of Econometrics, vol. 18(1), pp. 31-71.

[13] King, R., C.I. Plosser, J.H. Stock and M.W. Watson (1991), “Stochastic Trends and Economic Fluctuations”, American Economic Review, 81, 819-840.

[14] King, R.G., C.I. Plosser and S. Rebelo (1988), “Production, Growth and Business Cycles. II. New Directions,” Journal of Monetary Economics, vol. 21, pp. 309-341.

[15] Lin, J.L. and R.S. Tsay (1996), “Cointegration constraints and forecasting: An em- pirical examination”, Journal of Applied Econometrics, 11, 519-538.

[16] Lucas, R.E. (1977), “Understanding business cycles”, Carnegie-Rochester Series on Public Policy, 5, 7-29, also reprinted in Lucas, R. E. (Ed) Models of Business Cycle, 1977.

[17] L¨utkepohl, H. (1993), Introduction to Multiple Time Series Analysis, Second Edition, Springer-Verlag.

[18] L¨utkepohl, H. (1985), “Comparison of criteria for estimating the order of a vector autoregressive process”, Journal of Time Series Analysis,6,35-52.

[19] Nickelsburg, G. (1982), “Small sample properties of dimensionality statistics for fitting VAR models to aggregate economic data: A Monte Carlo study”, Proceedings of the Business and Economic Statistics Section of the American Statistical Association, 155-160.

[20] Reinsel, G.C. (1993), Elements of Multivariate Time Series Analysis, Springer-Verlag.

[21] Stock, J.H. and M.W. Watson (1988), “Testing for common trends”, Journal of the American Statistical Association, 83, 1097-1107.

[22] Stock, J. and Watson, M.(1989) “New Indexes of Leading and Coincident Economic Indicators”, NBER Annual, 351-95.

[23] Tiao, G.C. and R.S. Tsay (1989), “Model specification in multivariate time series (with discussion)”, JournaloftheRoyalStatisticalSociety,SeriesB,51, 157-213.

[24] Tso, M.K-S. (1981), “Reduced rank regression and Canonical analysis” Journal of the Royal Statistical Society, Series B, 43, 183-189.

[25] Vahid, F. (1999), “A property of the companion matrix of a reduced rank VAR”, Econometric Theory, 15, 787-788.

[26] Vahid, F. and R.F. Engle (1993), “Common trends and common cycles”, Journal of Applied Econometrics, 8, 341-360.

14 [27] Vahid, F. and Issler, J.V.(1999), “The Importance of Common-Cyclical Fea- tures in VAR Analysis: A Monte-Carlo Study,” Working Paper # 352, Grad- uate School of Economics, Getulio Vargas Foundation. Downloadable from www.fgv.br/epge/home/publi/Ensaios/ensaio consulta.cfm.

[28] Velu, R.P., G.C. Reinsel and D.W. Wickern (1986), “Reduced rank models for mul- tipletimeseries”,Biometrika, 73, 105-118.

[29] Zellner, A. and C. Hong (1989), “Forecasting International Growth Rates Using Bayesian Shrinkage and other Procedures”, Journal of Econometrics, 40, 183 - 202.

15 A Co-Movement Restrictions in Dynamic Models

Before discussing the dynamic representation of the data, and the trend-cycle decompo- sition method we have used, we present the definitions of common trends and common cycles. For a full discussion see Engle and Granger(1987) and Vahid and Engle(1993) respectively. First, we assume that yt is a n-vector of I(1) variables, with the stationary (MA( )) Wold representation given by: ∞ ∆ yt = C (L) ²t, (17)

∞ where C (L) is a matrix polynomial in the lag operator, L,withC (0) = In, Cj < j=1 k k ∞ . The vector ²t is a n 1 vector of stationary one-step-ahead linear forecastP errors in yt, × given information on the lagged values of yt. We can rewrite equation (1) as:

∆ yt = C (1) ²t + ∆C∗ (L) ²t (18) where Ci∗ = Cj for all i. In particular C0∗ = In C (1). j>i − − If we integrateP both sides of equation (18) we get:

∞ yt = C (1) ²t s + C∗ (L) ²t − s=0 X = τ t + Ct (19) Equation (19) is the multivariate version of the Beveridge-Nelson trend-cycle represen- tation (Beveridge and Nelson(1981)). The series yt are represented as sum of a random walk part τ t which is called the “trend” and a stationary part Ct whichiscalledthe “cycle”.

Definition 1 The variables in yt are said to have common trends (or cointegrate) if there are r linearly independent vectors, r

α0 C (1) = 0. r n × Definition 2 The variables in yt are said to have SCCF (or common cycles) if there are s linearly independent vectors, s n r,stackedinans n matrix α˜0, with the property that: ≤ − ×

α˜0 C∗ (L)=0. s n × 4This definition could alternatively be expressed in terms of an n r matrix γ,suchthat: × C (1) γ =0.

The Granger-Representation Theorem (Engle and Granger(1987)) shows that if the series in yt are coin- tegrated, α and γ in equation (21) below satisfy: C (1) γ =0, and,

α0C (1) = 0.

16 Thus, cointegration and common cycles represent restrictions on the elements of C (1) and C∗ (L) respectively. We now discuss restrictions on the dynamic autoregressive representation of economic time series arising from cointegration (common trends) and common cycles. First, we assume that yt is generated by a (VAR):

yt = A1yt 1 + ...+ Apyt p + ²t (20) − − p If elements of yt cointegrate, then the matrix I Ai must have less than full − i=1 rank, which imposes cross-equation restrictions on theP VAR. In this case, Engle and Granger(1987) show that the system (20) can be written as a Vector Error-Correction model (VECM):

∆ yt = A1∗ ∆ yt 1 + ... + Ap∗ 1 ∆ yt p+1 + γα0 yt 1 + ²t (21) − − − − where γ and α are full rank matrices of order n r, r is the rank of the cointegrating space, p p × I Ai = γα ,andA = Ai , j =1,... ,p 1. Given the cointegrating − − 0 j∗ − − µ i=1 ¶ i=j+1 vectors stackedP in α0, it can be seen thatP (21) parsimoniously encompasses (20). Condi- tional on knowledge of cointegrating vectors, the VECM has n2 (p 1) + n r parameters in the conditional mean, while the VAR has n2 p parameters.− Thus, the· former has n (n r) fewer parameters, since r

I α˜ = s α˜(∗n s) s ∙ − × ¸

Consideringα ˜0∆ yt =˜α0²t as s equations in a system, and completing the system by adding the unconstrained VECM equations for the remaining n s elements of ∆ yt, we obtain, −

∆ yt 1 − Is α˜∗0 0 . ∆ y = s (np+r) ⎡ . ⎤ + v , (22) 0 In s t × t " (n s) s − # " A1∗∗ ... Ap∗∗ 1 γ∗ # ∆ yt p+1 − × − ⎢ − ⎥ ⎢ α0yt 1 ⎥ ⎢ − ⎥ ⎣ ⎦ where Ai∗∗ and γ∗ represent the partitions of Ai∗ and γ respectively, corresponding to the Is α˜∗0 bottom n s reduced form VECM equations, and v = ² . t 0 In s t − " (n s) s − # − ×

17 It is easy to show that (22) parsimoniously encompasses (21). Since Is α˜∗0 is invertible, it is possible to recover (21) from (22). Notice however 0 In s " (n s) s − # that− the× latter has s (np + r) s (n s) fewer parameters. · − · −

Definition 3 The variables in yt are said to have SCCF in weak form if there are s linearly independent vectors, s n r,stackedinans n matrix α˜0, with the property that: ≤ − ×

α˜0 A∗ =0,i=1, 2, ,p 1. s n i × ··· −

Definition 4 The variables in yt are said to have SCCF in strong form if there are s linearly independent vectors, s n r,stackedinans n matrix α˜0, with the property that: ≤ − ×

α˜0 A∗ =0,i=1, 2, ,p 1,and, s n i × ··· − α˜0 γ =0. s n × This Definition is equivalent to the Definition of Common Cycles given above.

18 B VAR restictions for the DGPs

The following VAR(3) in levels:

yt = A1yt 1 + A2yt 2 + A3yt 3 + ²t, − − − can be rewritten as the following VAR(1) process,

∆yt (A2 + A3) A3 γ ∆yt 1 ²t − − − ∆yt 1 = I3 00 ∆yt 2 + 0 , (23) ⎡ − ⎤ ⎡ ⎤ ⎡ − ⎤ ⎡ ⎤ α0yt α0 (A2 + A3) α0A3 α0γ +1 α0yt 1 α0²t − − − ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ α11 α12 α11 γ11 where, γα =(A1 + A2 + A3 I3), α = α21 α22 , α = α21 , γ = γ , 0 − ⎡ ⎤ ⎡ ⎤ ⎡ 21 ⎤ αe31 αe32 α31 γ31 2 2 2 3 3 3 a11 a12 a13 ea11 ⎣a12e a13e ⎦ ⎣ ⎦ ⎣ ⎦ 2 2 2 3 3 3 A2 = a21 a22 a23 and A3 = a21 a22e a23e . ⎡ 2 2 2 ⎤ ⎡ 3 3 3 ⎤ a31 a32 a33 a31 a32 a33 It⎣ is helpful to define,⎦ ⎣ ⎦

∆yt (A2 + A3) A3 γ ²t − − ξt = ∆yt 1 ,F = I3 00and νt = 0 , ⎡ − ⎤ ⎡ ⎤ ⎡ ⎤ α yt α (A2 + A3) α A3 α γ +1 α ²t 0 − 0 − 0 0 0 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ to arrive at,

ξt = F ξt 1 + ν. (24) − If we consider cointegration and common-cycle restrictions, the following relations hold: 3 3 3 Ga31 Ga32 Ga33 3 3 3 (i) α0A3 =0 A3 = Ka31 Ka32 Ka33 , where G =[R21K + R31] ,K= ⇒ ⎡ 3 3 3 ⎤ a31 a32 a33 (R32 eR31) / (R21 R22) ,R⎣i1 = αi1/α11 and Ri2 =⎦αi2/α12 (i =2, 3), − − 2 2 2 Ga31 Ga32 Ga33 2 2 2 (ii) α0 (A2 + A3)=0 α0A2 =0e e A2 = Ka31e Kae 32 Ka33 , ⇒ ⇒ ⎡ 2 2 2 ⎤ a31 a32 a33 e Geγ31 ⎣ ⎦ (iii) α0γ =0 γ = Kγ , ⇒ ⎡ 31 ⎤ γ31 2 3 2 3 2 3 3 3 3 (iv) αe0 (A2 + A3)= ⎣(a31 + a⎦31)S (a32 + a32)S (a33 + a33)S and α0A3 = a31Sa32Sa33S where S = α G + α K + α , 11 21 £ 31 ¤ £ (v) α0γ +1=γ31S +1. The restrictions above imply that

19 2 3 2 3 2 3 3 3 3 G(a31 + a31) G(a32 + a32) G(a33 + a33) Ga31 Ga32 Ga33 Gγ31 − 2 3 − 2 3 − 2 3 − 3 − 3 − 3 K(a31 + a31) K(a32 + a32) K(a33 + a33) Ka31 Ka32 Ka33 Kγ31 ⎡ − (a2 + a3 ) − (a2 + a3 ) − (a2 + a3 ) − a3 − a3 − a3 γ ⎤ − 31 31 − 32 32 − 33 33 − 31 − 32 − 33 31 F = ⎢ 1000000⎥ . ⎢ ⎥ ⎢ 0100000⎥ ⎢ ⎥ ⎢ 0010000⎥ ⎢ ⎥ ⎢ (a2 + a3 )S (a2 + a3 )S (a2 + a3 )S a3 S a3 S a3 S γ S +1 ⎥ ⎢ 31 31 32 32 33 33 31 32 33 31 ⎥ ⎣ − − − − − − ⎦

If all the eigenvalues of matrix F lie inside the unit circle, then the VAR (24) is covariance- stationary. The eigenvalue of the matrix F is a number λ such that

F λI7 =0. (25) | − | The solution of (25) is:

0=λ7 (1 + γ S) a3 + a3 K + a3 G a2 + a2 K + a2 G λ6 (26) − 31 − 33 32 31 − 33 32 31 a2 + a2 K + a2 G λ5 a3 + a3 K + a3 G λ4. − 33£ 32 31¡ − 33 32 ¢ ¡ 31 ¢¤ ¡ ¢ ¡ 3 ¢ 3 3 2 2 2 To simplify notation, if we define Ω = (1 + γ31S) a33 + a32K + a31G a33 + a32K + a31G Θ = a2 + a2 K + a2 G and Ψ = −a3 + a3 K +−a3 G , (26) is: − − 33 32 31 − 33£ 32 31¡ ¢ ¡ ¡ ¢ λ7 + Ωλ6 ¡+ Θλ5 + Ψλ4 =0. ¢ (27)

Ω The roots of this polynomial are λ1 = λ2 = λ3 = λ4 =0, λ5 = A + B , λ6 = − 3 2 2 Ω 2 Ω 1+ √3 3 b2 2 b2 a3 Aω + Bω , λ7 = Aω + Bω , where, ω = − ,A= + + ,B= − 3 − 3 2 − 2 4 27 r q 3 b2 2 b2 a3 1 2 1 3 2 4 + 27 ,a= 3 3Θ Ω and b = 27 2Ω 9ΩΘ +27Ψ . These guarantee r− − − − that cointegrationq and SCCF¡ restrictions¢ hold for¡ well behaved VECMs.¢

20 CSystemR2 and signal-to-noise ratio

In a multiple regression with stochastic regressors and i.i.d. errors, y = Xβ + ε, the limiting signal-to-noise ratio (snr)canbedefined as:

X X β0 limT E 0 β →∞ T snr = 2 , (28) σε ³ ´

2 where E (εε0)=σε I, and the proportion of the variation of dependent variable explained by the model, i.e. the· population R2,is:

X0X β0 limT E T β snr R2 = →∞ = . 2 X0X σε + β0 limT ³E ´ β (1 + snr) →∞ T ¡ ¢ 1 2 X0X − Since the asymptotic variance of √T β β is AV AR(β)=σε limT E , − →∞ T we can write (28) as: ³ ´ ³ ³ ´´ b b 1 − snr = β0 AV AR(β) β. (29) ³ ´ Consider now a VAR(p): b

yt = A1yt 1 + + Apyt p + εt. (30) − ··· − The analogous measure of snr for it is:

1 snr = β0 Σ Ω− β (31) ⊗ ¡ ¢ where β = vec(A), A = A1 ... Ap , E εtεt0 j = Ω, and: − £ ¤ ³ ´ Γ0 Γ1 Γp 1 ··· − Γ10 Γ0 Γp 2 Σ = ⎛ ··· − ⎞ , ··· ··· ··· ··· ⎜ Γp0 1 Γp0 2 Γ0 ⎟ ⎜ − − ··· ⎟ ⎝ ⎠ where Γj = E ytyt0 j . Notice that Σ is completely determined by (A, Ω) via the Yule- − Walker equations³ 5. After´ some algebra, it can be shown that (31) is equal to:

1 1 snr = β0 Σ Ω− β = trace Γ0Ω− n. ⊗ − Using this last result, one can then¡ define¢ the system¡R2 to be:¢

1 2 trace Γ0Ω− n R = 1− . 1+trace (Γ0Ω ) n ¡ −¢ −

5See Hamilton (1994) Chapter 10, L¨utkepohl (1993) Chapter 1, or Reinsel (1993) Chapter 2.

21 DTables

Table 1 - Low system R2 Measure Frequency of lag (p) and cointegrating vectors (q) choice by different criteria when the true model is (3,1) in levels. Number of Observations = 100 Number of Observations = 200 Selected Cointegrating Vectors Selected Cointegrating Vectors Selected Lag 0 1 23 01 23 1 0.01 0.36 0.07 0.05 0.00 0.00 0.00 0.00 2 35.79 3.20 0.45 0.47 7.66 1.85 0.44 0.51 3 40.32 8.23 1.06 0.60 42.60 31.47 5.14 3.98 4 4.13 1.42 0.20 0.08 2.46 1.96 0.35 0.24 AIC(p) 5 1.14 0.56 0.07 0.02 0.41 0.41 0.07 0.04 6 0.46 0.32 0.06 0.02 0.11 0.13 0.03 0.01 7 0.27 0.21 0.04 0.01 0.04 0.04 0.00 0.01 8 0.16 0.20 0.04 0.01 0.01 0.03 0.01 0.00

1 0.26 2.53 0.50 0.24 0.00 0.00 0.00 0.00 2 68.50 5.66 0.78 0.87 35.07 8.10 1.86 2.39 3 15.34 4.26 0.56 0.27 24.47 21.95 3.46 2.63 4 0.14 0.07 0.02 0.00 0.03 0.03 0.01 0.00 HQ(p) 5 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

1 3.08 11.69 2.10 1.13 0.02 0.05 0.02 0.00 2 72.73 5.59 0.75 0.88 66.71 15.02 3.38 4.48 3 1.34 0.63 0.07 0.03 3.90 5.12 0.78 0.53 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SC(p) 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Numbers represent the percentage times that the model selection criterion chose that cell, corresponding to the lag and number of cointegrating vectors, in 100,000 times (1000 simulations of 100 different DGPs). The true lag-cointegrating vectors are indentified by bold numbers.

22 Table 2 - High system R2 Measure Frequency of lag (p) and cointegrating vectors (q) choice by different criteria when the true model is (3,1) in levels. Number of Observations = 100 Number of Observations = 200 Selected Cointegrating Vectors Selected Cointegrating Vectors Selected Lag 0 1 23 01 23 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2 0.00 32.93 2.69 1.37 0.00 8.64 1.08 0.67 3 0.00 49.10 3.50 1.43 0.00 71.47 7.57 4.47 4 0.01 5.24 0.46 0.16 0.00 4.17 0.47 0.26 AIC(p) 5 0.03 1.42 0.16 0.06 0.00 0.73 0.08 0.06 6 0.04 0.59 0.08 0.02 0.00 0.20 0.03 0.02 7 0.04 0.30 0.05 0.01 0.00 0.06 0.01 0.00 8 0.04 0.23 0.05 0.01 0.00 0.02 0.00 0.00

1 0.00 0.02 0.01 0.00 0.00 0.00 0.00 0.00 2 0.00 68.34 5.55 2.87 0.00 39.86 5.15 3.17 3 0.00 20.69 1.60 0.66 0.00 44.22 4.73 2.82 4 0.00 0.23 0.02 0.01 0.00 0.05 0.01 0.00 HQ(p) 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

1 0.00 0.42 0.10 0.06 0.00 0.00 0.00 0.00 2 0.00 86.26 6.96 3.60 0.00 74.78 9.46 6.00 3 0.00 2.35 0.17 0.08 0.00 8.31 0.95 0.50 4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SC(p) 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Numbers represent the percentage times that the model selection criterion chose that cell, corresponding to the lag and number of cointegrating vectors, in 100,000 times (1000 simulations of 100 different DGPs). The true lag-cointegrating vectors are indentified by bold numbers.

23 Table 3 - Low system R2 Measure) Frequency of lag-rank (p,r) and cointegrating vectors (q) choice by different criteria when the true model is (2,1,1) in diferences. Number of Observations = 100 Number of Observations = 200 Selected Number 001 2233 1 of coint. vectors Selected Rank 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Selected Lag 1 6.74 1.99 0.35 0.60 0.24 0.05 0.09 0.03 0.01 0.09 0.04 0.01 0.60 0.21 0.03 0.15 0.07 0.02 0.04 0.01 0.00 0.06 0.01 0.00 2 40.29 7.66 0.44 6.46 1.66 0.11 0.80 0.23 0.02 0.48 0.13 0.01 34.86 5.37 0.26 24.18 4.04 0.16 3.88 0.69 0.04 3.05 0.56 0.03 3 7.46 2.16 0.06 1.78 0.60 0.02 0.19 0.08 0.01 0.08 0.02 0.00 5.03 1.14 0.02 3.63 0.80 0.02 0.51 0.17 0.00 0.34 0.09 0.00 4 3.26 0.96 0.02 1.08 0.36 0.01 0.11 0.05 0.00 0.04 0.01 0.00 2.00 0.37 0.00 1.52 0.30 0.00 0.20 0.05 0.00 0.11 0.02 0.00 AIC(p) 5 2.16 0.45 0.01 0.90 0.29 0.00 0.11 0.05 0.00 0.03 0.01 0.00 1.05 0.17 0.00 0.82 0.15 0.00 0.11 0.02 0.00 0.06 0.01 0.00 6 1.52 0.31 0.00 0.75 0.23 0.00 0.09 0.04 0.00 0.02 0.01 0.00 0.63 0.05 0.00 0.51 0.07 0.00 0.06 0.01 0.00 0.03 0.00 0.00 7 1.30 0.24 0.00 0.91 0.20 0.00 0.12 0.03 0.00 0.03 0.01 0.00 0.39 0.04 0.00 0.37 0.04 0.00 0.03 0.01 0.00 0.02 0.00 0.00 8 1.43 0.22 0.00 1.10 0.28 0.00 0.18 0.07 0.00 0.04 0.01 0.00 0.33 0.02 0.00 0.28 0.02 0.00 0.04 0.00 0.00 0.01 0.00 0.00

24 1 22.11 1.62 0.13 2.00 0.20 0.02 0.28 0.02 0.00 0.30 0.03 0.00 3.92 0.20 0.01 0.99 0.07 0.01 0.22 0.01 0.00 0.29 0.01 0.00 2 53.81 1.07 0.02 9.54 0.30 0.01 1.19 0.06 0.00 0.69 0.02 0.00 47.07 0.35 0.00 33.59 0.31 0.00 5.48 0.05 0.00 4.25 0.05 0.00 3 3.39 0.07 0.00 1.07 0.03 0.00 0.11 0.01 0.00 0.05 0.00 0.00 1.32 0.00 0.00 1.12 0.01 0.00 0.16 0.00 0.00 0.10 0.00 0.00 4 0.71 0.01 0.00 0.35 0.00 0.00 0.04 0.00 0.00 0.01 0.00 0.00 0.17 0.00 0.00 0.15 0.00 0.00 0.02 0.00 0.00 0.01 0.00 0.00 HQ(p) 5 0.25 0.00 0.00 0.16 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 6 0.04 0.00 0.00 0.04 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7 0.02 0.00 0.00 0.04 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

1 45.35 0.37 0.02 4.03 0.06 0.00 0.56 0.01 0.00 0.59 0.01 0.00 14.51 0.05 0.00 3.47 0.01 0.00 0.77 0.00 0.00 1.00 0.00 0.00 2 38.87 0.03 0.00 7.84 0.01 0.00 0.97 0.00 0.00 0.54 0.00 0.00 40.30 0.00 0.00 30.85 0.01 0.00 4.96 0.00 0.00 3.83 0.00 0.00 3 0.47 0.00 0.00 0.20 0.00 0.00 0.03 0.00 0.00 0.01 0.00 0.00 0.11 0.00 0.00 0.10 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 4 0.02 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SC(p) 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Numbers represent the percentage times that the model selection criterion chose that cell, corresponding to the lag-rank and number of cointegrating vectors, in 100,000 times (1000 simulations of 100 different DGPs). The true lag-rank-cointegrating vectors are indentified by bold numbers. Table 4 - High system R2 Measure Frequency of lag-rank (p,r) and cointegrating vectors (q) choice by different criteria when the true model is (2,1,1) in diferences. Number of Observations = 100 Number of Observations = 200 Selected Number 0 1 2 3 0 1 2 3 of coint. vectors Selected Rank 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Selected Lag 1 0.00 0.00 0.00 6.95 2.03 0.36 0.55 0.21 0.04 0.27 0.09 0.03 0.00 0.00 0.00 0.75 0.30 0.06 0.09 0.03 0.01 0.06 0.03 0.01 2 0.00 0.00 0.00 45.59 8.86 0.50 2.87 0.77 0.05 1.18 0.30 0.02 0.00 0.00 0.00 57.13 8.98 0.45 5.90 0.98 0.05 3.48 0.60 0.03 3 0.01 0.00 0.00 8.66 2.41 0.08 0.52 0.22 0.01 0.19 0.08 0.00 0.00 0.00 0.00 8.15 1.79 0.05 0.80 0.18 0.00 0.43 0.11 0.00 4 0.17 0.03 0.00 3.92 1.07 0.02 0.27 0.12 0.00 0.06 0.03 0.00 0.00 0.00 0.00 3.31 0.63 0.01 0.30 0.06 0.00 0.14 0.04 0.00 AIC(p) 5 0.28 0.05 0.00 2.42 0.57 0.01 0.14 0.07 0.00 0.04 0.01 0.00 0.00 0.00 0.00 1.80 0.24 0.00 0.14 0.03 0.00 0.07 0.02 0.00 6 0.35 0.06 0.00 1.67 0.35 0.01 0.12 0.04 0.00 0.03 0.01 0.00 0.00 0.00 0.00 1.00 0.13 0.00 0.09 0.02 0.00 0.04 0.01 0.00 7 0.41 0.08 0.00 1.41 0.30 0.00 0.10 0.07 0.00 0.02 0.00 0.00 0.01 0.00 0.00 0.70 0.07 0.00 0.07 0.00 0.00 0.02 0.00 0.00 8 0.56 0.12 0.00 1.61 0.31 0.00 0.13 0.06 0.00 0.02 0.01 0.00 0.02 0.00 0.00 0.49 0.04 0.00 0.04 0.01 0.00 0.01 0.00 0.00

25 1 0.00 0.00 0.00 22.26 1.50 0.17 1.82 0.18 0.02 0.92 0.08 0.01 0.00 0.00 0.00 4.83 0.24 0.02 0.61 0.03 0.00 0.36 0.02 0.00 2 0.00 0.00 0.00 59.97 1.29 0.01 3.93 0.15 0.00 1.64 0.05 0.00 0.00 0.00 0.00 77.30 0.62 0.01 8.08 0.07 0.00 4.81 0.04 0.00 3 0.01 0.00 0.00 3.99 0.08 0.00 0.26 0.02 0.00 0.09 0.01 0.00 0.00 0.00 0.00 2.25 0.01 0.00 0.22 0.00 0.00 0.11 0.00 0.00 4 0.04 0.00 0.00 0.86 0.01 0.00 0.05 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.27 0.00 0.00 0.03 0.00 0.00 0.01 0.00 0.00 HQ(p) 5 0.03 0.00 0.00 0.27 0.00 0.00 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.05 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 6 0.01 0.00 0.00 0.05 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

1 0.00 0.00 0.00 45.70 0.38 0.02 3.69 0.05 0.01 1.92 0.02 0.00 0.00 0.00 0.00 17.32 0.06 0.00 2.23 0.01 0.00 1.37 0.01 0.00 2 0.00 0.00 0.00 43.39 0.04 0.00 2.89 0.01 0.00 1.21 0.00 0.00 0.00 0.00 0.00 67.58 0.01 0.00 7.03 0.00 0.00 4.19 0.00 0.00 3 0.01 0.00 0.00 0.59 0.00 0.00 0.04 0.00 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.16 0.00 0.00 0.02 0.00 0.00 0.01 0.00 0.00 4 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 SC(p) 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Numbers represent the percentage times that the model selection criterion chose that cell, corresponding to the lag-rank and number of cointegrating vectors, in 100,000 times (1000 simulations of 100 different DGPs). The true lag-rank-cointegrating vectors are indentified by bold numbers. Table 5 - Low system R2 Measure Percentage improvement in different measures of accuracy in forecast generated by the possibly reduced rank VAR over the full rank VAR chosen by the same model selection criterion when the true models are trivariate (3,2,1). Horizon AIC HQ SQ (h) GFESM |MSFE| TMSFE GFESM |MSFE| TMSFE GFESM |MSFE| TMSFE Sample size 100 1 4.26 4.26 1.28 10.04 10.04 3.43 19.65 19.65 7.24 4 0.08 -0.67 0.32 13.27 0.15 1.11 26.76 0.73 2.05 8 -4.78 -0.33 0.03 13.75 0.16 0.56 30.00 0.77 1.14 12 -5.94 -0.23 -0.01 14.14 0.05 0.39 32.28 0.57 0.83 16 -6.55 -0.15 -0.02 14.57 0.14 0.30 34.43 0.57 0.67

Sample size 200 1 3.92 3.92 1.26 6.52 6.52 2.19 7.85 7.85 2.73 4 5.95 0.32 0.66 10.93 0.41 0.95 12.52 0.03 0.93 8 5.64 0.13 0.34 11.88 0.13 0.50 14.04 0.18 0.49 12 5.95 0.03 0.24 12.50 0.04 0.35 15.02 0.05 0.35 16 6.21 0.01 0.18 12.93 0.03 0.27 15.68 0.01 0.27 GFESM is Clements and Hendry's generalized forecast error second moment measure, |MSFE| is the determinant of the of the mean squared forecast error matrix and TMSFE is the trace of the MSFE matrix. Bold and underline numbers denote, respectively, the best and the worst forecasting performance across all three information criteria based on TMSFE.

Table 6 - High system R2 Measure Percentage improvement in different measures of accuracy in forecast generated by the possibly reduced rank VAR over the full rank VAR chosen by the same model selection criterion when the true models are trivariate (3,2,1). Horizon AIC HQ SQ (h) GFESM |MSFE| TMSFE GFESM |MSFE| TMSFE GFESM |MSFE| TMSFE Sample size 100 1 133.89 133.89 70.21 109.02 109.02 58.18 95.47 95.47 50.95 4 96.19 0.83 16.67 53.69 -3.20 11.14 33.99 -3.55 8.40 8 48.25 0.79 8.43 7.68 0.53 5.40 -7.84 0.35 4.00 12 18.97 -0.02 5.66 -17.83 0.17 3.62 -28.56 0.10 2.66 16 1.04 -0.10 4.24 -30.76 0.03 2.71 -38.02 0.04 1.99

Sample size 200 1 131.78 131.78 73.16 117.68 117.68 65.33 99.72 99.72 54.72 4 73.98 -2.39 15.48 52.26 -5.19 12.19 36.35 -6.38 9.24 8 18.16 0.16 7.39 -1.67 0.18 5.61 -14.55 -0.01 4.08 12 -13.54 0.05 4.89 -30.99 0.16 3.72 -41.82 0.06 2.68 16 -31.01 -0.03 3.65 -45.16 0.00 2.78 -54.60 0.01 2.00 GFESM is Clements and Hendry's generalized forecast error second moment measure, |MSFE| is the determinant of the of the mean squared forecast error matrix and TMSFE is the trace of the MSFE matrix. Bold and underline numbers denote, respectively, the best and the worst forecasting performance across all three information criteria based on TMSFE.

26