Beginning Tutorials

Regression with Time Errors

David A. Dickey, North Carolina State University

Abstract: if t represents time and X = t, then part

of our mo del consists of a simple linear time

The basic assumptions of regression are re-

trend and there will b e no surprises when we

viewed. Graphical and statistical metho ds

try to extend the time sequence 1; 2; 3;:::;n

for checking the assumptions are presented

into the future. On the other hand if X is

t

using a sales example. Departures from in-

the number of incoming phone calls at time

dep endence in are empha-

t then to time n +1 would re-

sized and illustrated in the example. Sev-

quire that some value be inserted for X

n+1

TM

eral pro ducts from SAS Institute for ana-

and this value will itself likely b e a forecast.

lyzing regressions with time series errors are

These two examples represent deterministic

illustrated. The imp ortance of the sto chastic

and stochastic explanatory variables, resp ec-

prop erties of the mo del input variables is em-

tively.

phasized. Forecasts from several mo dels for

the example data are compared.

The nature of the X variables will a ect the

forecast accuracy - obviously a p erson fore-

1. Intro duction:

casting with a known future X is b etter o

than one who must estimate that future X .

Regression is a to ol that allows one to mo del

Thus a problem we will need to deal with, if

the relationship b etween a resp onse variable

wewant to put some sort of error b ounds on

Y , which might be a mail order company's

our forecasts, is the incorp oration of our level

sales, and some explanatory variables usually

of uncertainty ab out the future X values.

denoted X where X might b e the cost of one

j 1

item from the company, X the cost of a sim-

2

The usual way of estimating the s is the

ilar item from a comp etitor company and X

3

metho d used in PROC GLM and PROC

the number of phone calls coming in to the

REG. The metho d is referred to as ordinary

company's switchb oard. Atypical regression

least squares in that it nds estimates b of

mo del for this situation is

j

the parameters that minimize the error

j

P

n

Y = + X + X + X + e

0 1 1 2 2 3 3

sum of squares SSE = Y b + b X +

t 0 1 1

t=1

2

b X + b X  . This SSE is a function of the

2 2 3 3

estimates, b , and much of the sub ject of cal-

j

where the regression co ecients, , are un-

culus is concerned with nding values of ar-

known.

guments, like these b , that minimize a func-

j

tion, SSE in our case. Thus wehave mathe- You would like to estimate these s, for if you

matical to ols which are relatively easy to im- could, you would then havean equation for

plement on the computer that allowusto nd predicting a future Y from the asso ciated X s.

the minimizing values. This is what PROC Notice that even if the regression co ecients

REG and PROC GLM are set up to do. Fur- were known, such a prediction would require

thermore, allows us to com- knowledge of future X values. For example, 1

Beginning Tutorials

a of the residuals, a hanging his- pute measures of uncertainty called standard

togram in which each bar b ecomes a line seg- errors for these b estimates and the result-

j

ment at the former bar midp oint, this line b e- ing forecasts if certain conditions are satis-

ing hung from the normal curve rather than ed. Note the expression \if certain condi-

rising from the horizontal axis, and a plot tions are satis ed." It is this with whichwe

of the residuals against their normal scores. are concerned here.

These are very easy to pro duce using the fol-

In this pap er we review these \certain condi-

lowing co de:

tions," indicate why they might be violated

pro c capability graphics;

when data are taken over time, present meth-

histogram r /normal hanging vref=0;

o ds for checking these conditions, and nally

histogram r / normal;

represent corrections that can be applied if

qqplot r / normal mu=est sigma=est;

the conditions are violated. The corrections

that we sp eak of are implemented in SAS In-

The lo ok reasonably normal

stitute's PROC AUTOREG.

and the quantile-quantile plot reasonably

straight. PROC CAPABILITY also presents

Throughout the pap er we will use an arti -

tests of the normalityhyp othesis but the the-

cial example in which X represents the num-

t

ory b ehind these assumes indep endence, an

ber of phone calls in week t to a mail order

assumption wehaveyet to check.

company and Y is the numb er of shipments

t

for that week. Figure 1 shows the data over

Not shown is a simple plot of residuals against

a 3 year p erio d. We are interested in es-

predicted values. Because this lo oks uni-

timating the company's growth, estimating

form as opp osed to megaphone shap ed this

the numb er of shipments generated p er phone

check on the homogeneous assump-

call, and forecasting phone calls and sales two

tion do es not give us cause for concern.

weeks into the future.

The regression and subsequent calculation of

2. Checking the usual assumptions.

residuals was accomplished with this co de:

Our mo del is Y = + X + t + e . We

pro c reg; mo del y = t x/dw;

t 0 1 t 2 t

assume

pro c reg; mo del y = t/dw;

A Normality:

where Y is sales, X phone calls, and t week

The errors all come from normal

numb er. The previous residual analysis was

distributions

from the rst regression. The advantage

of the second regression is that only future

B Homogeneity:

values of t would be needed for forecasting

These normal distributions all

whereas for the rst mo del wewould need to

have 0 and

know, or at least estimate, next week's phone

2

the same variance, 

calls to forecast sales.

Notice the dw options. These request the

C Indep endence:

\Durbin Watson" which is a test

The correlation b etween e and e is 0

i j

for auto correlation, that is, correlation be-

for i not equal to j 

tween successive residuals. Auto correlation

is a commonly o ccurring violation of the in- We can check the normality assumption by

dep endence assumptions when data are taken drawing histograms and normal probability

over time. The option also gives an estimate plots of the residuals. In gures 2-4, we see 2

Beginning Tutorials

r of the rst order auto correlation. We get 21 b so that from our Z ,we could get a

dw = 1:407 and b = :283 for the rst mo del, large sample approximate distribution for the

dw = :969 and b = :497 for the second mo del. Durbin-Watson statistic. The real contribu-

tion of Durbin and Watson was to to show

3. The Durbin-Watson statistic and

how to get the exact nite sample distribu-

rst order auto correlation.

tion of the statistic dw.

The Durbin-Watson statistic is dw =

Unfortunately the Durbin-Watson theory

P P

n n

2 2

r where r is the r r  =

t t t1

t shows that the exact distribution dep ends on

t=1 t=2

residual at time t. If e represents white noise

t

the values of the X explanatory variables in

an uncorrelated sequence then we nd these

the regression so that each new problem en-

exp ected values:

countered would require a new table of crit-

ical values. However, if none of the X vari-

2 2 2

E fe e  g = E fe e e + e g =

t t1 t t1

t

t1

ables are lagged Y values and the errors are

2 2

 +0+

normal, they were able to calculate b ounds

2 2

Efe g = 

t

for all critical values. Thus if you enter the

P P

tables of Durbin and Watson for a certain

n n

2 2

Thus e e  = e should be

t t1

t

t=2 t=1

sample size and numb er of explanatory vari-

near 2, that is, the Durbin Watson statistic

ables you will see upp er and lower b ounds for

should b e near 2 if calculated on a white noise

the true critical value.

sequence. If there is p ositive correlation b e-

tween neighb oring e's then e and e would

t t1

A dw to the left of the lower b ound is clearly

b e more alike than in the white noise case so

less than the critical value and thus to o close

that e e would b e smaller in magnitude

t t1

to 0 to accept the indep endence hyp othesis

and thus dw would movetoward 0.

under which dw should b e near 2. Adwto

the rightof the upp er b ound makes it clear

The rst order correlation in the residuals is

that dw is closer to 2 than is the critical value

n n

so we cannot reject indep endence. Adwbe-

X X

2

b = r r r r = r r 

t t1 t

tween the b ounds just tells you that the cal-

t=2 t=1

culated dw and the critical value are b etween

these numb ers so you have no idea how they

are placed relativetoeach other.

whichisvery close to what one would get by

simply inserting r and r into the standard

t t1

Durbin and Watson also gave a computation-

formula for a correlation. If, as in our exam-

ally intensiveway of computing p-values us-

ple, the regression contains an intercept then

ing the observed X s. We will see howtoget

r =0. It is well known that if b is computed

p-values from PROC AUTOREG. It should

on a white noise series and if the sample size

b e noted that the restriction that lagged Y s

n is reasonably large, then

not be included in the explanatory X vari-

p

p

2

ables still holds so that a mo del with lagged

Z = nb 1 b

Y s, explicitly or implicitly among the ex-

planatory variables, would not pro duce exact

is approximately a N 0; 1 random variable

p-values using the Durbin-Watson metho d.

so that for large samples, values of jZ j ex-

ceeding 1.96 would give us reason to susp ect

Because the large sample Z approximation

that auto correlation is present.

is reasonably go o d, the tables of Durbin and

Watson typically do not extend to very large A bit of algebra demonstrates that the

nvalues so for our example residuals, we use Durbin-Watson statistic is roughly equal to 3

Beginning Tutorials

p

p

2

A The estimates of the regression

Z = n=b 1 b getting

co ecients are still unbiased.

p p

2

156:283= 1 :283 =3:7

B The estimates of the regression

co ecients vary more from sample

to sample than do the b est

for the rst mo del, and

estimates, but still maybe

reasonably ecient.

p p

2

156:497= 1 :497 =7:2

C Estimates of standard errors for

co ecients and anything computed

for the second mo del. Using 1.96 as a critical

from them t , p-

value wehave strong evidence for auto corre-

values and con dence intervals

lation in our example.

for example are biased - often

badly biased.

For our example we have normal residuals

with homogeneous variance, but they are

Using our simple and an or-

clearly not indep endent for either of our mo d-

der 1 error pro cess a = a + e we note

t t1 t

els.

that the equation holds at b oth times t and

t 1 so that, multiplying through by the au-

toregressive parameter we obtain

4. Adjusting for auto correlation.

Y = + X + a

t 0 1 1t t

Supp ose we have a

mo del

Y = + X + a

t1 0 1 1t1 t1

and adding wehave the transformedmodel

Y = + X + a

t 0 1 1t t

Y + Y =

t t1

where, instead of white noise e our error

t

1 + + X + X +a + a 

term satis es a mo del suchas

0 1 1t 1t1 t t1

Nowwe note the following p oints ab out this

a = a a  a +e

t 1 t1 2 t2 p tp t

transformed mo del:

A This is a linear mo del in the

This error mo del is called autoregressive of

transformed variables in parentheses

order p where the order refers to the number

of lagged a's app earing in the equation for a .

t

B It has the same co ecients as the

If p = 1 then the rst order auto correlation

original mo del

co ecient from PROC REG is a reasonable

estimate of but in higher order mo dels, the

C It has error term a + a 

t t1

relationship b etween the autoregressive coef-

where we are assuming

cients  s and the auto correlations is much

a + a = e , that is this

t t1 t

more convoluted.

new mo del satis es all the usual regression

assumptions!

What happ ens if we just ignore the auto cor-

relation?

D It has n 1, not n observations. 4

Beginning Tutorials

in a sp ecial way. Note that p oint C implies that running an

ordinary regression would suce if we knew

or could approximate it well. We can re-

cover the lost observation by noting that

5. PROC AUTOREG for the sales

p p

2 2

data.

1 Y = 1 +

1 0

p

2

1 X + a

1 11 1

We use PROC AUTOREG on our sales data

2 2

with the following co de:

This works b ecause Vara  =  =1 .

t

What we will do is regress column 1 on

pro c autoreg;

columns 2 and 3 in this table using the rst

order auto correlation of the residuals from an

modely=t x/nlag=4 backstep dwprob par-

initial regression as an

tial;

estimate of .

p p p

The output b egins with ordinary least

2 2 2

1 Y 1 1 X

1 1

squares, used to get the residuals and mo del

Y + Y 1+ X + X

2 1 2 1

them. Next come auto correlations of the

Y + Y 1+ X + X

3 2 3 2

residuals. The lag j auto correlation is sim-

. . .

. . .

ply an estimate of the correlation b etween a

t

. . .

and a computed from the ordinary least

tj

Y + Y 1+ X + X

n n1 n n1

squares residuals r think of r as an estimate

t t

th

of a . The j partial auto correlation can b e

These new estimates of the s can be used

t

interpreted as an estimate of the last co e- to compute new residuals, a new estimate of

cient in the regression of a on a ;:::;a

, new columns in the table and the whole

t t1 tj

and thus would be 0 after the appropriate

pro cess can be iterated until the estimates

numb er of autoregressive lags are included in stabilize.

the mo del. Time series exp erts use auto cor-

Alternatively, one can simply notice at the

relations and partial auto correlations to diag-

outset that this whole pro cedure amounts

nose the nature of correlation in the residuals.

to an attempt to minimize the error sum

of squares in a nonlinear mo del and thus

The drop in the auto correlations after lag 1

use standard nonlinear search techniques i.e.

is more dramatic than that in the partial au-

full blown maximum likeliho o d estimation of

to correlations. This suggests that a mo del

and the s simultaneously instead of the

other than autoregressive for the error terms

alternating technique ab ove. Either way,we

might b e considered, thus taking us into the

have estimated the mo del

realm of PROC ARIMA which we discuss

later.

Y = Y + 1 + + X + X +

t t1 0 1 1t 1t1

e

t

Estimates of Auto correlations

which is seen to be a mo del that includes

Lag Covariance Correlation

lagged Y 's among the explanatory variables.

Now it is p ossible that the mo del needs more

0 446.1458 1.000000

than 1 lagged residual. The pro cedure is es-

1 126.196 0.282858

sentially the same, we just need more terms

2 -17.9134 -0.040151

in the transformation and more than 1 obser-

3 27.51148 0.061665

vation at the b eginning needs to b e recovered

4 8.913212 0.019978 5

Beginning Tutorials

Partial Auto correlations would b e selected. The criteria trade the t

of the mo del o against its complexity just

1 0.282858

as a p erson might trade the functionalityof

2 -0.130610

a new computer against its cost in deciding

3 0.123244

which machine to purchase.

4 -0.047635

We observe that the Durbin-Watson statistic

The next part of the output is a backward

on the transformed mo del is now close to 2

elimination of insigni cant autoregressive pa-

and an approximate p-value larger than 0.05

rameters  's starting with the least signi -

app ears. This is not an exact p-value since

cant.

the transformed mo del implicitly uses an esti-

mated co ecient on lagged Y 's to predict the

current Y and thus do es not satisfy Durbin

Backward Elimination of Autoregressive

and Watson's assumptions.

Terms

Lag Estimate t-Ratio Prob

Finally there are 2 R-square values. This

4 0.047635 0.5821 0.5614

is b ecause, in predicting Y one step ahead,

3 -0.123244 -1.5210 0.1304

we can use a prediction based on the X 's

2 0.130610 1.6188 0.1076

and their co ecients only or we can add to

that a forecast of the next residual based on

our autoregressive error mo del and the most

We are left, then, with just a lag 1 autoregres-

recently observed residuals. The p ercent-

sive mo del. The pro cedure next summarizes

age of variation explained under these sce-

the mo del:

narios are the regression R-square and total

R-square resp ectively. In that sense, the total

R-square is a predictability R-square while

Estimates of the AutoregressiveParameters

the regression R-square tells howmuch of the

Lag Co ecient Std Error t Ratio

predictabilityis asso ciated with the X vari-

1 -0.28285812 0.077798 -3.636

ables whichmay b e dicult or exp ensiveto

obtain - esp ecially future values of them.

Yule-Walker Estimates

SSE 63800.85 DFE 152

MSE 419.7425 Ro ot MSE 20.48762

SBC 1401.124 AIC 1388.924

The pro cedure next uses the estimated

Reg Rsq 0.7101 Total Rsq 0.7764

Durbin-Watson 1.8850 PROB < DW 0.2013

to get the transformed variables, e.g. Y

t

0:2829Y , and runs ordinary least squares

t1

on the transformed variables. Because of

The autoregressive co ecient -0.2829 divided

the transformation, the resulting co ecients,

by its is t = -3.636 and since

standard errors, etc. are correct and, except

our sample size is reasonably large, t exceed-

for the fact that is estimated instead of

ing 1.96 in magnitude is considered signi -

known, the X co ecients are the b est linear

cant. The error mean square 419.7 estimates

unbiased estimates of the parameters.

2

 , the variance of e so our error mo del is

t

a = :2829a + e

The mo del parameters part of the output is

t t1 t

given next. A p ortion of this is shown here:

The SBC and AIC are information criteria

that can be used to compare mo dels with

di ering numb ers of parameters. The mo del

delivering the smallest information criterion 6

Beginning Tutorials

Variable DF BValue t Ratio Approx

Cho osing SBC as a criterion wehave

Prob

Intercept 1 5.571991 0.882 0.3793

Mo del SBC MSE

T 1 0.038574 0.732 0.4656

X and t 1401 420 -.2829

X 1 0.947411 18.220 0.0001

X only 1397 418 -.2873

t only 1607 1658 -.4972

We see that the time trend T  which seemed

to app ear in our graph do es not seem imp or-

tant after X is included in our mo del. Note

so the mo del with X only is clearly preferred.

that this do es not say that there is no increase

in sales, only that there is no increase b eyond

One confusing phenomenon that often oc-

what would have b een predicted from incom-

curs is that, although statistical theory indi-

ing phone calls. Phone calls X are strongly

cates that the estimates from our transformed

signi cant. We mighthave b een happier had

mo del should be b etter than the ordinary

the signi cance results b een reversed, since

least squares estimates that ignore auto cor-

we need to supply next week's phone

relation, the ordinary least squares printout

to predict next week's sales in a mo del involv-

shows smaller standard errors than the ones

ing X .

shown in PROC AUTOREG. How can that

b e if the PROC AUTOREG metho d provides

What are our choices in terms of forecasting?

b etter estimates? The answer is simple - the

One option is to somehow get estimates of

ordinary least squares numb ers are not go o d

future X values. Here are some forecasts of

estimates of the standard errors and are of-

phone volume which actually came from SAS

ten, but not always, deceptively small. In

PROC ARIMA. They have b een app ended to

other words by ignoring auto correlation we

our original data and you see their Y values

are using inferior estimates of the parameters

are missing.

but the standard errors falsely indicate that

they are sup erior.

T DATE X Y

The three mo dels estimated by PROC AU-

154 12/16/94 153.000 141

TOREG with standard errors are:

155 12/23/94 146.000 159

156 12/30/94 180.000 220

Y = 5:5720 + 0:0386t+ 0:9474X

t t

157 01/06/95 157.292 .

6:31910:0527 0:052

158 01/13/95 141.705 .

159 01/20/95 131.008 .

Y = 7:3681 +0:9589X

t t

160 01/27/95 123.665 .

5:8111 0:0498

161 02/03/95 118.625 .

Y = 83:8472 + 0:3388t

t

Do we really think X will be 157.292 next

11:09560:1222

week? Of course not, this is just a forecast

so the use of this X value in computing a

6. Forecasting

forecast for Y will add to the inaccuracy of

the forecast. On the other hand, since we

The mo del with only X is

stopp ed in week 156, the use of t = 157 for

next week's t in our mo del will intro duce no

Y = 7:3681 + :9589X + a

t t t

inaccuracy in the forecast.

5:81110:050 7

Beginning Tutorials

Figure 6 shows the forecasts and intervals with a = :2873 a + e and we are

t t1 t

from the mo del that uses time t as the ex- at the last week of year 3 in our data so

planatory variable. Here we can prop erly t=156. Standard errors are in parentheses.

treat future t's as known, but pay a price in The last observation was Y = 220 and

156

that the mo del do es not t as well. The fore- X = 180 so that the residual, an esti-

156

casted X ARIMA plot is overlaid on this it mate of a , would b e r = 220 7:3681

156 t

gives the lower forecasts and intervals and it :9589180 = 40:030 and we predict a as

157

is interesting to note that although this mo del :287340:030 = 11:501. Now if we assume

t substantially worse according to our statis- X will be 157:292 then we predict Y

157 157

tical tests, once we admit that there is error to be 7:3681 + :9589157:292 + 11:501 =

in our forecasts of X , our forecasts and error 158:195 + 11:501 = 169:696. This is the

bands are similar in b oth mo dels. In the long rst forecast in the PROC AUTOREG out-

term, the mo del with t will give forecasts that put dataset and an asso ciated standard er-

trend linearly upward while the forecasts with ror 20.9 is used to compute a 95 prediction

the ARIMA mo del will return to the historic interval. The problem is that this standard

mean as will always happ en with a stationary error is computed assuming that X will b e ex-

ARIMA mo del. actly 157.292 in the next time interval. Our

true level of uncertainty in the forecast of Y

would certainly b e in uenced by the variabil-

Finally, PROC ARIMA allows the tting of

ity in our imputed X value.

a larger class of error mo dels than autore-

gressive. A lag1moving mo del also

7. Using PROC ARIMA

provides an excellent t to the error series

for the sales data. Using the autoregressive

We can mo del the whole pro cess, X and Y ,

mo del for X , a linear relationship b etween Y

in PROC ARIMA. We rst compute a mo del

and X , and an order 1 error

for X whichPROC ARIMA estimates as

term, our estimated mo del b ecomes

X 107:6= :6864X 107:6 + e

t t1 t

Y =8:607 + 0:95 X + e + :37e

t t t t1

NowPROC ARIMA can t and forecast the

X 107:6= 0:686 X 107:6 + u

t t1 t

same mo del as PROC AUTOREG, however

it gives you the option of using the X mo del

Because the error term e e isamoving

t t1

to provide forecasts of future X 's and it in-

average we estimated this mo del in PROC

corp orates the uncertainty in those X 's in the

ARIMA:

Y forecasts. Our forecasts of future X 's were

from PROC ARIMA so the forecasts from the

pro c arima data=a;

two pro cedures will match but the forecast

ivar=x noprint; e p=1 ml;

error will di er. We close by pre-

ivar=y crosscor=x nlag=5;

senting graphs of forecasts and their standard

e input = x q=1 ml;

errors from several scenarios.

f lead=5 out=out5 id=t;

In gure 5, the forecasts from PROC AU-

Because our input variable is mo deled, we TOREG and PROC ARIMA with their in-

will get a crosscorrelation plot which has b een tervals are overlaid. The di erence in widths

prewhitened. It is a plot of correlations b e- of the forecast intervals illustrates the magni-

tween Y at time t and X at time t j which tude of error that is b eing ignored when one

has b een cleared of any indirect correlations treats future X 's as known when they are in

caused by auto correlation in the X series. It fact forecasts. 8

Beginning Tutorials

is thus a picture of howchanges in X are in- It is seen that there is no lagged correla-

corp orated into the Y series. tion, only contemp oraneous correlation. The

moving average error structure gave error

The plot of the crosscorrelatins for our exam-

mean square 408.7 as compared to 418.3 so

ple will lo ok similar to this:

it is the b est t of all mo dels considered here

by that criterion.

Crosscorrelations

SAS is the registered trademark of SAS In-

Lag Corr

stitute Inc. in the USA and other countries

-5 -0.065 . * j .

TM

indicates USA registration.

-4 0.043 . j * .

-3 -0.090 . ** j .

-2 -0.060 . * j .

-1 0.008 . j .

0 0.797 . j **********

1 0.057 . j * .

2 -0.010 . j .

3 -0.053 . * j .

4 0.135 . j ***

5 -0.124 . ** j .

\." marks two standard errors 9