<<

Introduction to Models

James L. Powell Department of Economics University of California, Berkeley

Overview

In contrast to the classical model, in which the components of the dependent

variable vector y are not identically distributed (because its vector varies with the regressors) but may be independently distributed, time series models have dependent variables which may be identically

distributed, but are typically not independent across ovbservations. Such models are applicable for

that are collected over time. A leading example, to be discussed more fully below, is the …rst order

autoregression model, for which the dependent variable yt for time period t satis…es

yt = + yt 1 + "t; for "t satisfying the assumptions of the error terms in a classical linear regression model (i.e., mean zero, constant , and uncorrelated across t). For some values of the parameters, yt will have constant

mean and over t; but the between yt and ys will generally be nonzero when t = s: The 6 …rst-order autoregression model can be viewed as a special case of a dynamic regression model, with

yt = + yt 1 + xt0  + "t;

with xt a vector of regressors. The usual purpose of these models is prediction; given the recursive structure of such models, realizations

of the dependent variable today are useful in its value in the future. In much of time series

modeling, the values of the parameters themselves are not the objects of interest, but rather the ability of

the speci…ed model to forecast out of the observed sample; thus, much of the statistical methodology is

devoted to …nding a “good”model for the data rather than the “right”model.

Stationarity and

The for time series data views the sequence of dependent variables yt as a f g discrete time , i.e., a realization of a random function whose argument is the (integer)

time index t (Unless stated otherwise, the discussion here will assume yt is scalar; the index t is sometimes

1 restricted to be nonnegative or positive.) More formally, if ( ; ;P ) is a probability space with sample F space ; set of events ; and probability measure P; then a discrete time stochastic process yt = yt(!) is F a mapping from the integers N and sample space into the real line R:

y : N R:  !

Unless stated otherwise, it will be assumed that moments of yt exist, with

t = E[yt];

2 2  = E(yt  ) : t t

If we could observe a sequence of i:i:d: realizations of the yt(!i) ability space, then by the strong , the ensemble averages

1 N y~t yt(!i)  N i=1 X would converge to  with probability one as N , as would the corresponding sample variance. t ! 1 However, without restrictions on the variation of the joint distribution of yt over t –so that, for example,

the and variances of yt could vary freely over t –it would clearly be impossible to construct consistent

estimators of those parameters with a single realization ! = !1 of history. The concept of stationarity imposes such restrictions.

The process yt is said to be weakly stationary (or covariance stationary) if the second moments of yt f g exist, and the …rst and second moments satisfy

E(yt) = ;

2 V ar(yt) =  (0)  y

Cov(yt; ys) = (t s) = ( t s ): y y j j

That is, the mean values of yt are constant, and the covariance between any pair yt and ys of observations depends only on the (absolute) di¤erence of their indices t s : By reducing the means, variances, and j j between pairs of observations to a single time-invariant parameter, there is some hope of

consistently estimating those parameters with a single realization of the process yt : The function (s) f g y is called the function of the yt process.

2 A “stronger” de…nition of stationarity, suggestively titled strong stationarity, restricts the joint dis- tribution of any …nite collection of consecutive realizations of yt to be invariant across t; in the sense that

Pr (yt; yt+1; :::yt+K ) B = Pr (y0; y1; :::yK ) B f 2 g f 2 g for any nonnegative integer K and corresponding event B: This is not, strictly speaking, stronger than weak stationarity without the additional conditions that the second of yt is …nite, with which it does indeed imply covariance stationarity. For the theoretical development, when deriving the mean-squared error of forecasts, etc., the assumption of weak stationarity usually su¢ ces; when deriving probability limits and asymptotic distributions for , typically strong stationarity (or a similar strengthening of weak stationarity) is assumed.

Since econometric modeling typically involves characterization of relationships for several variables, it

M is useful to extend the notion of stationarity to vector processes, where yt R for some M > 1: Such a 2 process is covariance stationary if

E(yt) = ;;

V ar(yt) =  y(0); 

C(y ; ys) = E (yt )(yt )0 t   = y(t s)

= [y(s t)]0 : Extension of the concept of strong stationarity to vector processes is similarly straightforward.

Even if a scalar dependent variable yt is stationary, it need not be true that a law of large numbers applies, i.e., stationarity does not imply that

T 1 p yT yt E(yt) = :  T ! t=1 X If this condition is satis…ed, yt is said to be ( weakly) ergodic; It is said to be strongly ergodic if

T 1 a:s: f(yt; yt+1; :::; yt+K ) E(f(yt; yt+1; :::; yt+K )) T ! t=1 X whenever the latter moment exists. It is easy to construct examples of stationary processes which are not ergodic; for example, if

yt z N(; 1);  

3 then yt is clearly (weakly and strongly) stationary, but yT z =  with probability one. Another example  6 is

z1 N(; 1) when t is even, yt =  z2 N(; 1) when t is odd,   where z1 and z2 are independent. Such processes are special cases of deterministic processes, which can be perfectly predicted by a linear combination of past values:

yt = 1yt 1 + 2yt 2 + ::::

For the …rst process, 1 = 1 and the rest are zero, while for the second, only 2 = 1 is nonzero; in general, the coe¢ cients must sum to one to ensure stationarity. In practice, seasonal factors (which are periodically-recurring “constant terms”) are good examples of deterministic processes; for these processes, it is usually possible to consistently estimate the realized values (from noisy data), but not the parameters of the distributions which generated them

For a process to be ergodic, the some measure of the dependence between observations yt and ys must vanish as t s increases; if so, a law of large numbers should be applicable. For weakly stationary processes, j j p a su¢ cient condition for yT  is that the autocovariance function declines to zero for large arguments, ! i.e., (s) 0 as s ; the proof of this result is a useful exercise. However, stronger restrictions on y ! ! 1 dependence between observations and existence of moments are needed to obtain central limit theorems; some of the restrictions on dependence have the headings “ conditions” or ”martingale di¤erence sequences.”Such conditions will be considered; su¢ ce it to say that, for all of the nondeterministic processes considered below, it is possible to …nd su¢ cient regularity conditions to ensure that a applies: d pT (yT ) N(0;V0); ! where V0 is the limit of the variance of the normalized average pT (yT );

V0 = lim V ar(pT (yT )) T !1 1 = y(s): s= X1 2 For i.i.d. observations, V0 reduces to the usual (0) = V ar(yt)  : All of these results will extend to y  y the case when yt is a random vector; in this case, a dependent CLT for vector stochastic processes would yield d pT (yT ) N(0; V0); !

4 with the asymptotic V0 de…ned as

1 V0 y(s);  s= X1 for y(s) the autocovariance matrix de…ned above.

Moving Average Processes

A ‡exible class of models for (possibly) stationary univariate time series, proposed by Box and

Jenkins in the mid-1960s, are autoregressive models –ARMA models for short –which will be discussed more fully below. The fundamental building block for ARMA models is a white process, which is just a colorful mixed metaphor (light and ) for a stochastic process "t which satis…es the properties imposed upon error terms in the standard .

2 Process: The process "t is called a white noise process with parameter  ; denoted f g 2 "t WN( ); if it is weakly stationary with 

E("t) = 0;

2 V ar("t) =  ;

Cov("t;"s) = 0 if t = s: 6

From this simplest example of a weakly stationary and weakly ergodic process (which is strongly stationary if "t is assumed to be i.i.d.), it is possible to build other processes yt with more interesting autocovariance patterns by assuming yt is generated by a linear combination of its past values plus a linear combination of current and past values of a white noise error term "t: First are the moving average processes; which are linear combinations of white noise "error terms" across di¤erent time periods.

First-order Moving Average Process: The process yt is a …rst-order moving average process, denoted yt MA(1); if it can be written as  yt =  + "t + "t 1;

2 2 where "t WN( ) (and ,  and  > 0 are constants, possibly unknown). 

5 It is easy to see that an MA(1) process is covariance stationary, with mean E(yt) =  and autocovari- ance function

2 2 (0) = V ar(yt) =  (1 +  );

2 (1) = Cov(yt; yt 1) =  ;

(s) = Cov(yt; yt s) = 0 if s > 1:

Generalizations of the MA(1) process include more lagged white noise components on the right-hand side of the equation for yt:

th th q -order Moving Average Process: The process yt is a q -order moving average process, denoted yt MA(q); if it can be written as 

yt =  + "t + 1"t 1 + ::: + q"t q;

2 where "t WN( ): 

Straightforward calculations show that the moments of this MA(q) process are

E(yt) = ;

2 2 2 (0) = V ar(yt) = (0) =  (1 + 1 + ::: + q);

2 = (1) = Cov(yt; yt 1) =  (12 + ::: + q 1q); :::

(s) = Cov(yt; yt s) = 0 if s > q:

So the autocovariance function (0) of an MA(q) process “cuts o¤”after q lags; conversely, if a weakly has an autocovariance sequence that is zero after q lags, then it can be represented as a MA(q) process for some white noise sequence "t: If the white noise components "t are mutually f g independent, and not just serially uncorrelated, then the corresponding MA(m) process is said to be m-dependent, since yt and yt s are independent if t exceeds m:

A more compact notation for the MA(q) equation involves lag polynomials. De…ning the lag operator

L as

Lyt yt 1 

6 (i.e., application the lag operator shifts the subscript backward one period), lags of yt can be expressed in terms of powers of L :

2 L yt L(Lyt) = L(yt 1) = yt 2;  :::

s L yt yt s: 

Similarly, leads of yt can be expressed in terms of negative integer powers of L :

s L yt yt+s: 

With this notation, a MA(q) process can be expressed as

yt =  + (L)"t; where (L) is the “lag polynomial"

2 q (L) 1 + 1L + 2L + ::: + qL ;  with the interpretation that

2 q (L)"t 1 + 1L + 2L + ::: + qL "t  2  q = "t + 1L"t + 2L "t + ::: + qL "t

= "t + 1"t 1 + 2"t 2 + ::: + q"t q:

In the jargon of time series analysis, a linear combination of a time series yt at di¤erent leads and lags is said to be a linear …lter of yt; and the lag polynomial (L) is called a one-sided linear …lter, since it involves only nonnegative powers of the lag operator L, meaning that it only involves current and past values of the process it premultiplies.

Autoregressive Processes

Another simple class of time series models are purely autoregressive processes, which only involve a single, contemporaneous white noise term, but instead involve linear combinations of lagged values of the output process (i.e., a one-sided linear …lter of yt). Unlike moving average processes, these are not weakly stationary for all possible values of the parameters.

7 First-order Autoregressive Process: The process yt is …rst-order autoregressive, denoted yt AR(1); if  it satis…es

yt = + yt 1 + "t;

2 where "t WN( ) and Cov("t; yt s) = 0 if s 1:  

Not all AR(1) processes are stationary; if the process is stationary, then E(yt) = E(yt 1); implying

E(yt) = + E(yt)

= ; 1

which requires = 1: Furthermore V ar(yt) = V ar(yt 1); which requires 6

2 V ar(yt) = V ar(yt) + V ar("t) + 2 Cov(yt 1;"t)  2 2 = V ar(yt) +  2 = ; 1 2 which is only well-de…ned and nonnegative if < 1: This latter condition is su¢ cient for weak stationarity j j of yt; calculations analogous to those for the variance yield

2 s  s y(s) = Cov(yt; yt s) = = V ar(yt); 1 2 so the covariance between yt and yt s declines geometrically as s increases; if is negative, the autoco- oscillates between positive and negative values.

Generalizations of the AR(1) process include more lagged dependent variables on the right-hand side of the equation for yt :

th th p -order Autoregressive Process: The process yt is p -order autoregressive, denoted yt AR(p); if it  satis…es

yt = + 1yt 1 + ::: + pyt p + "t;

(L)yt 1 + "t; 

2 where "t WN( );  (L) + L + ::: + LP :  1 2 p

8 and Cov("t; yt s) = 0 if s 1: 

Another expression for the AR(p) generating equation is

(L)yt = + "t;

with

(L) 1 (L) L;   so that a process is autoregressive if a one-sided linear …lter of the process is a white noise process (plus a constant term).

The conditions for stationarity of this process are related to the conditions for stability of the corre- sponding deterministic di¤erence equation

yt = + 1yt 1 + ::: + pyt p; speci…cally, the AR(p) process is stationary if any (real or complex) root z of the associated polynomial equation

p 1 0 = ~(z) z (z )   p p 1 z 1z ::: p 1z p  is inside the unit circle, i.e., z < 1. The derivation of this condition for stationarity is the subject of j j another lecture (as is the form of the autocovariance function for a stationary AR(p) process). Nonetheless, it is useful to check the plausibility if the general stationarity condition by verifying it is the same condition

< 1 for the AR(1) model; in this special case, ~(z) = z ; whose single root z = must be less than j j  one in magnitude to ensure stationarity, for the reasons outlined above.

ARMA Processes

The autoregressive and moving average processes can be combined to obtain a very ‡exible class of univariate processes (proposed by Box and Jenkins), known as ARMA processes.

ARMA(p,q) Process: The time series yt is an ARMA(p,q) process, written yt ARMA(p; q); if 

yt = + 1yt 1 + ::: + pyt p + "t + 1"t 1 + ::: + q"t q

= + (L)yt 1 + (L)"t;

9 or

(L)yt = + (L)"t

2 where "t WN( ) and Cov("t; yt s) = 0 if s 1:   The requirements for stationarity of this process are the same as for stationarity of the corresponding

AR(p) process.

Linear Processes and the Wold Decomposition

If we permit the order q of a MA(q) process to increase to in…nity –that is, if we write

1 yt =  + j"t j j=0 X 2 with "t WN( ) and 0 1; we obtain what is known as a linearly indeterministic process, denoted   yt MA( ): This process is well-de…ned (in a mean-squared error sense) if the sequence of moving average  1 coe¢ cients s is square-summable, f g 1 2 < ; j 1 j=0 X in which case it is easy to see that

2 1 2 y(0) = V ar(yt) =  j ; j=0 X and, more generally,

2 1 y(s) =  jj+s: j=0 X By recursion, stationary ARMA processes can be written as linearly deterministic processes; for exam-

s ple, a stationary AR(1) process yt = + yt 1 + "t has s : Conversely, the MA coe¢ cients for any  linearly indeterministic process can be arbitrarily closely approximated by the corresponding coe¢ cients of a suitable ARMA process of su¢ ciently high order.

Wold showed that all covariance stationary stochastic processes could be decomposed as the sum of deterministic and linearly indeterministic processes which were uncorrelated at all leads and lags; that is, if yt is covariance stationary, then

yt = xt + zt; where xt is a covariance stationary deterministic process (as de…ned above) and zt is linearly indeterministic, with Cov(xt; zs) = 0 for all t and s. This result gives a theoretical underpinning to Box and Jenkins’ proposal to model (seasonally-adjusted) scalar covariance stationary processes as ARMA processes.

10 A linearly indeterministic process yt is said to be a generalized linear process if the white noise compo-

nents "t are independently and identically distributed over t; it is said to be a linear process if it satis…es f g the additional restriction that the moving average coe¢ cients are absolutely summable, i.e.,

1 j < : j j 1 j=0 X Since the square-summability condition implies j 0 as j ; absolute summability is a stronger ! ! 1 requirement than square summability; under this stronger condition, it can be shown that

V0 = lim V ar(pT yT ) T !1 1 = y(s) s= 1 < X; 1

so the stronger requirement that the yt process is often imposed to ensure applicability of a central limit

theorem for yT :

Common Factors and Identi…cation

In a sense, ARMA processes are too ‡exible, in the sense that low-order processes (i.e., those

with p and q small) are nested in higher-order processes with certain parameter restrictions. In general,

if yt ARMA(p; q); then it can always be rewritten as an ARMA(p + r; q + r) process for arbitrary  2 positive integer r by suitable “generalized di¤erencing”. For example, suppose yt = "t WN  ; so  that yt ARMA(0; 0): Then for any  with  < 1;   j j

yt yt 1 = "t "t 1; or

yt = yt 1 + "t "t 1; so yt ARMA(1; 1) with a particular restriction on the parameters (i.e., the sum of the …rst-order  autoregressive and moving average coe¢ cients is zero). For this example this redundancy is easy to …nd, but for more complicated ARMA processes the restrictions on the parameters may be di¢ cult to …nd in the population, and even harder to detect in estimation.

Box and Jenkins’proposed solution to this common factors problem, which they called their “principle of parsimony”,is simple enough –just pick p and q to be small enough to do the job (of forecasting, etc.). To

11 implement this general idea, however, they proposed a methodology for which they termed time series identi…cation procedures. In econometric applications, the tradition has been to consider only purely autoregressive processes, i.e., assume that yt AR(p) for some value of p (chosen in practice by  a suitable model selection procedure). Purely autoregressive processes, while typically requiring a higher number of parameters to approximate complicated dynamic patterns, do not su¤er from the common factor problem, since a redundant generalized di¤erence in the autoregressive component is accompanied by an error term which is not white noise (i.e., q = r > 0). Furthermore, as will be discussed later, purely autoregressive processes are simpler to estimate, requiring only linear (not nonlinear) LS estimation.

12