Chapter 10: Basic Series Regression

1. Time series have temporal ordering. Past can affect , not vice versa. That is the biggest difference from the cross sectional data.

2. In time series data there is a variable that serves as index for the time periods. That variable also indicates the of observations. For instance, we have yearly data if one observation becomes available once a year. The stata command tset declares the data are time series.

3. The regression that uses time series can have causal interpretation, provided that the regressor series is uncorrelated with the error series. The regression result will be biased when there are some omitted series. More often, time series are used for purpose.

4. Consider the causal study first.

(a) For simplicity a simple regression is considered

Yt = β0 + β1Xt + ut (1)

where subscript t is used to emphasize that data are time series. (b) Model (1) is a static model because it only captures the immediate or contem-

poraneous effect of X on Y. When Xt changes by one unit, it only has effect on

Yt (in the same period). In other words, Yt+1,Yt+2, and so on are unaffected. In light of this, model (1) is static. (c) It is easy to account for lag effect by using lag value. There are first lag, second lag, and so on. The table below is one example

t Yt Yt−1 Yt−2 ∆Yt 1 2.2 na na na 2 3 2.2 na 0.8 3 -2 3 2.2 -5

i. t is the index series. t = 1 corresponds to the first (or earliest) observation.

ii. Yt−1 is the first lag of Yt. When t = 2,Yt−1 = Y2−1 = Y1 = 2.2. So the second

observation of Yt−1 is the first observation of Yt. The first observation of Yt−1

1 is missing value (denoted by na) because there is no data for Y0. Essentially you get the first lag by pushing the whole series one period down.

iii. Yt−2 is the second lag of Yt. You get the second lag by pushing the whole series two periods down (and two missing values are generated).

iv. ∆Yt is the first difference. By definition,

∆Yt ≡ Yt − Yt−1 (2)

for example, when t = 2, ∆Y2 ≡ Y2 − Y2−1 = 3 − 2.2 = 0.8. ∆Y1 is missing

since Y0 is missing

v. Exercise : Use the above table and find ∆Yt−1 for t = 1, 2, 3 vi. The stata commands to generate the first lag and second lag are gen ylag1 = y[_n-1] gen ylag2 = y[_n-2]

vii. Alternatively you can refer to the first and second lags by using the lag operator L (after declaring time series using tset)

L.y, L2.y

(d) The distributed lag (DL) model uses lag value of X to account for the lag effect. A DL model with two lags are

Yt = β0 + δ0Xt + δ1Xt−1 + δ2Xt−2 + ut (3)

where the parameter δ is called (short-run) multiplier.

i. It follows that a change in Xt has effect on Yt,Yt+1 and Yt+2. Then the effect

of Xt vanishes.

2 ii. Mathematically, we can show

dYt = δ0 (4) dXt dYt+1 = δ1 (5) dXt dYt+2 = δ2 (6) dXt dY t+j = 0, (j > 2) (7) dXt

where dYt+1 can be obtained from the regression where Y is the dependent dXt t+1 variable. iii. When we graph δj as a of j, we obtain the lag distribution. iv. The cumulative effect of Xt on Yt,Yt+1 and Yt+2 is called long run propensity (LRP) or long run multiplier.

LRP ≡ δ0 + δ1 + δ2 (8)

There are two steps to obtain the estimate and for LRP. First,

write θ = δ0 + δ1 + δ2, which implies that δ0 = θ − δ1 − δ2. Then equation (3) becomes

Yt = β0 + δ0Xt + δ1Xt−1 + δ2Xt−2 + ut (9)

= β0 + (θ − δ1 − δ2)Xt + δ1Xt−1 + δ2Xt−2 + ut (10)

= β0 + θXt + δ1(Xt−1 − Xt) + δ2(Xt−2 − Xt) + ut (11)

The last equation suggests running the transformed regression of Yt onto Xt, ˆ d ˆ d (Xt−1 − Xt) and (Xt−2 − Xt). Then θ = LRP , se(θ) = se(LRP ) and the confidence interval for θ is the confidence interval for LRP. v. Due to multicollinearity, it can be difficult to obtain precise estimates of the

short run multiplier δj. However we can often get good estimate of LRP. vi. To mitigate multicollinearity, we can impose some restriction on the lag distribution. For example, if we assume the lag distribution is flat, i.e.,

3 δ0 = δ1 = δ2 = δ then model (3) reduces to

Yt = β0 + δZ + ut, (Z ≡ Xt + Xt−1 + Xt−2) (12)

(e) In theory we need to include infinite lag value of X, which is infeasible, if the effect of X never dies out. Instead we may consider an autoregressive distributed lag model (ADL) given by

Yt = β0 + ρYt−1 + δXt + ut (13)

By recursive substitution

Yt+1 = β0 + ρYt + δXt+1 + ut+1 (14)

= β0 + ρ(β0 + ρYt−1 + δXt + ut) + δXt+1 + ut+1 (15)

= ... + ρδXt + ... (16)

we can show the effect of X on Y is

dY dY dY t = δ, t+1 = ρδ, . . . , t+j = ρjδ (17) dXt dXt dXt

Alternatively you use apply the chain rule of calculus and get

dY dY dY t+1 = t+1 t = ρδ (18) dXt dYt dXt dY dY dY t+2 = t+2 t+1 = ρ(ρδ) = ρ2δ (19) dXt dYt+1 dXt

The effect of X decreases exponentially if |ρ| < 1, but never becomes zero no matter how large j is.

5. We get the autoregressive (AR) model (or autoregression) if X is excluded from the ADL model. More explicitly, a first order is

Yt = β0 + ρYt−1 + ut (20)

(a) The AR model is more suitable for forecasting than ADL and DL models since we do not need to forecast X before forecasting Y. This is because in AR model

4 the regressor is just the lagged dependent variable. (b) The AR model can be used to test serial correlation. A series is serially correlated ifρ ˆ in (20) is significantly different from 0 (when its p-value is less than 0.05). (c) (Optional) The time series has a (and becomes non-stationary) if it is extremely serially correlated (or highly persistent). In theory, a series has unit root if ρ = 1 in (20). In practiceρ ˆ ≈ 1 is suggestive of unit root. We can show the of a unit root series goes to infinity as t rises. The conventional law of large number and both require finite variance, so neither apply to a unit root series. That in general we need to difference a unit root series (to make it stationary) before using it in regression. One exception is, you do not need to difference unit root series if they are cointegrated. The unit root series is also called . (d) The AR model with one lag can be fitted by stata command

reg y L.y

(e) Then the one-period ahead forecast is computed as

ˆ ˆ Yt+1 = β0 +ρY ˆ t (21)

and the two-period ahead forecast is

ˆ ˆ ˆ Yt+2 = β0 +ρ ˆYt+1 (22)

In general the h-period ahead forecast is

ˆ ˆ ˆ Yt+h = β0 +ρ ˆYt+h−1 (h = 2, 3,...) (23)

In short all forecasts can be computed in an iterative (recursive) fashion. (f) The AR model with two lags can be fitted by stata command

reg y L.y L2.y

(g) We can add third lag, fourth lag, and so on until the new lag has insignificant coefficient.

5 6. If the time series is trending, we can add a linear trend (and quadratic trend) into the AR model and run below regression:

Yt = β0 + ρYt−1 + ct + ut (24)

where t is the linear trend. According to the Frisch-Waugh Theorem, the estimated detrended detrended coefficientρ ˆ in (24) can be obtained by regressing Yt onto Yt−1 , where detrended Yt denotes the detrended series, and it is the residual of regressing Yt on the trend. In short,ρ ˆ in (24) measures the effect on the deviation of the series around its trend.

7. The trend term can also be used in DL model as proxy for unobserved trending factors.

8. If we have monthly or quaterly data, there may exist regular up and down pattern at specific interval. This phenomenon is called . For instance, the sale data typically go up in November and December. It is easy to account for seasonality by using seasonal dummy that equals one during a particular season.

6 Example: Chapter 10

1. The data are 311 fertil3.dta. See example 10.4 and 10.8 in textbook for detail.

2. The dependent variable is general fertility rate (gfr). It is the number of children born to every 1000 women of childbearing age. The key independent variable is personal tax exemption (pe). We want to control two variables. ww2 is a dummy variable that equals 1 during World War II (year 1941-1945). pill is another dummy variable that equals one after year 1963 (in that year birth control pill became available)

3. The data contain 72 yearly observations between 1913 to 1984. In 1913, for example, gfr = 124.7, pe = 0 and pill = 0.

4. Stata command twoway line gfr pe year draws the time series plot of gfr (red line) and pe (blue line). It is shown that pe was upward trending before mid 1940. Starting in 1950, both gfr and pe overall have been downward trending. We notice that the biggest rise in pe happened around WWII.

5. Stata command reg gfr pe ww2 pill estimates the static model of (10.18) in text- book. All variables are significant, and the signs of their coefficients are expected. The coefficient of pe indicates that a 12 (= 1/0.083) dollar increase in pe will increase gfr by about one birth per 1000 women. Being in WWII causes 24 fewer births for every 1000 women. Having access to birth control pill on causes 31 fewer birth per 1000 women.

6. The static model has a drawback that it only captures the immediate or contempo- raneous effect of pe on gfr. By contrast, the distributed lag model can capture both immediate and lag effects. We cannot ignore the lag effect as long as it takes a while for people to adjust. After generating the first and second lag of pe, stata command

reg gfr pe pelag1 pelag2 ww2 pill

reports the distributed lag model (10.19) in textbook. The F test shows that pe and its two lags are jointly significant, even though individually insignificant. The individual insignificance is due to the correlation among pe and its lagged values.

7. Next we run the transformed regression (shown on the top of page 359 in 5th edition) in order to estimate long run propensity (LRP). It is shown that LRPd = .1007191 and its standard error is .0298027.

7 8. We may worry that the regression (10.18) produces spurious (biased) result. From the time series plot, it is obvious that gfr and pe tended to move together (both moved down) after 1950s. We may suspect that there is an omitted trending variable, which is correlated with both gfr and pe. In other words, we suspect the observed correlation between gfr and pe is possibly due to that omitted variable, so tells us nothing about .

9. Then we include a linear trend as a proxy for the unobserved trending factors. The coefficient of trend is -1.149872 and significant. This confirms that gfr is overall down- ward trending. After the trending factor is controlled for, the coefficient of pe becomes .2788778, much bigger than without trend. This new estimate measures the effect of pe on detrended gfr. Put differently, it is shown that pe explains the deviation of gfr around its trend better than the level of gfr.

10. We obtain equation (10.35) in textbook after using both trend and squared trend as proxy. The squared trend is significant, confirming the nonlinear trending behavior of gfr.

11. So far we use these time series for purpose of inducing causality. In reality, the main use of time series is forecasting. We cannot use the stata time series command until we declare the data are time series, which is done by command tset year

12. Autoregression (AR) is one example of time series model. It is a popular forecasting model because it uses a variable’s own history to predict its future. In other words, no other variables are needed. Regressors in autoregression consist of lagged dependent variables only.

13. Autoregression is rationalized by the fact that most time series are serially correlated. That is, the current level of most series is correlated with previous or earlier level. Autoregression utilizes the idea that history can repeat itself.

14. We start with the first order AR model for gfr. The stata command is

reg gfr L.gfr

where L.gfr refers to the fist lag value gfrt−1. By using the lag operator L, we can skip the generation of lagged value using [ n-1].

8 15. The estimated AR(1) model is

d gfrt = 1.304937 + .9777202gfrt−1

It is easy to obtain forecast using AR model. For example, let t = 1980, gfr1979 = 100, d then the forecast of gfr in 1980 is gfr1980 = 1.304937 + .9777202(100) = 99.076957

16. By contrast, if we use distributed lag model, then we need to forecast X before fore- casting Y. This is a harder job.

17. The AR(2) model (with two lags) can be estimated by command reg gfr L.gfr

L2.gfr where L2.gfr refers to the second lag gfrt−2. Both the first and second lags are significant, implying that AR(1) model is mis-specified by omitting the second lag term.

18. In theory we can keep adding lagged values until it becomes insignificant.

19. Exercise : Use the lag operator L and rewrite the stata command for regression (10.19) in textbook.

20. Exercise : Please estimate the follow model

gfrt = β0 + β1pet + β2ww2t + β3pillt + β4gfrt−1 + u

where the lagged dependent variable is used as proxy for the unobserved factors. Com- pare it to regression (10.34) in textbook.

21. Exercise : Continue the above exercise. Can you reject the hypothesis H0 : β4 = 1? How to estimate the restricted model that imposes this hypothesis?

9 0 50 100 150 200 250 1920 itspr10 oe 54 realvaluepers.exemption,$ births per1000women15−44 1940 1913 to1984 10 1960 1980 11 12 13 14 Do File

* Do file for chapter 10 set more off clear capture log close cd "I:\311" log using 311log.txt, text replace use 311_fertil3.dta, clear list in 1/5 sum pe * time plot for gfr and pe twoway line gfr pe year * generate dummy variable for World War II (1941-1945) gen ww2 = 0 replace ww2 = 1 if year==1941 | year==1942 | year == 1943 | year == 1944 | year == 1945 * equation 10.18 in textbook, static model reg gfr pe ww2 pill * generate the first and second lagged value of pe gen pelag1 = pe[_n-1] gen pelag2 = pe[_n-2] * equation 10.19 in textbook, finite distributed lag model reg gfr pe pelag1 pelag2 ww2 pill * F test for joint significance of pe and its lags test pe pelag1 pelag2 * estimate LRP and its standard error based on transformed regression gen dpe1 = pelag1 - pe gen dpe2 = pelag2 - pe reg gfr pe dpe1 dpe2 ww2 pill * generate trend and squared trend gen t = _n gen tsq = t^2 * equation 10.34 in textbook reg gfr pe ww2 pill t * equation 10.35 in textbook

15 reg gfr pe ww2 pill t tsq * declear time series data tset year * AR(1) regression reg gfr L.gfr * AR(2) regression reg gfr L.gfr L2.gfr log close exit

16