Lecture 4: Estimation of ARIMA models
Florian Pelgrin
University of Lausanne, Ecole´ des HEC Department of mathematics (IMEA-Nice)
Sept. 2011 - Dec. 2011
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 1 / 50 Road map
1 Introduction Overview Framework
2 Sample moments
3 (Nonlinear) Least squares method Least squares estimation Nonlinear least squares estimation Discussion
4 (Generalized) Method of moments Methods of moments and Yule-Walker estimation Generalized method of moments
5 Maximum likelihood estimation Overview Estimation
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 2 / 50 Introduction Overview 1. Introduction 1.1. Overview
Provide an overview of some estimation methods for linear time series models :
Sample moments estimation ;
(Nonlinear) Linear least squares method ;
Maximum likelihood estimation
(Generalized) Method of moments Other methods are available (e.g., Bayesian estimation or Kalman filtering through state-space models) !
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 3 / 50 Introduction Overview
Broadly speaking, these methods consist in estimating the parameters of interest (autoregressive coefficients, moving average coefficients, and variance of the innovation process) as follows :
Optimize QT (δ) δ s.t. (nonlinear) linear equality and inequality constraints
where QT is the objective function and δ is the vector of parameters of interest.
Remark : (Nonlinear) Linear equality and inequality constraints may represent the fundamentalness conditions.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 4 / 50 Introduction Overview
Examples
(Nonlinear) Linear least squares methods aims at minimizing the sum of squared residuals ;
Maximum likelihood estimation aims at maximizing the (log-) likelihood function ;
Generalized method of moments aims at minimizing the distance between the theoretical moments and zero (using a weighting matrix).
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 5 / 50 Introduction Overview
These methods differ in terms of :
Assumptions ;
”Nature” of the model (e.g., MA versus AR) ;
Finite and large sample properties ;
Etc.
High quality software programs (Eviews, SAS, Splus, Stata, etc) are available ! However, it is important to know the estimation options (default procedure, optimization algorithm, choice of initial conditions) and to keep in mind that all these estimation techniques do not perform equally and do depend on the nature of the model...
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 6 / 50 Introduction Framework 1.2. Framework
Suppose that (Xt ) is correctly specified as an ARIMA(p,d,q) model
d Φ(L)∆ Xt = Θ(L)t
2 where t is a weak white noise (0, σ ) and
p Φ(L) = 1 − φ1L − · · · − φpL with φp 6= 0 q Θ(L) = 1 + θ1L + ··· + θqL with θq 6= 0.
Assume that 1 The model order (p,d, and q) is known ; 2 The data has zero mean ; 3 The series is weakly stationary (e.g., after d-difference transformation, etc).
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 7 / 50 Sample moments 2. Sample moments
Let (X1, ··· , XT ) represent a sample of size T from a covariance-stationary process with
E [Xt ] = mX for all t Cov [Xt , Xt−h] = γX (h) for all t
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 8 / 50 Sample moments If nothing is known about the distribution of the time series, (non-parametric) estimation can be provided by : Mean
T 1 X mˆ ≡ X¯ = X X T T t t=1 Autocovariance
T −|h| 1 X γˆ (h) = (X − X¯ )(X − X¯ ) X T t T t+|h| T t=1 or
T −|h| 1 X γˆ (h) = (X − X¯ )(X − X¯ ). X T − |h| t T t+|h| T t=1 Autocorrelation
γˆX (h) ρˆX (h) = . γˆX (0)
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 9 / 50 (Nonlinear) Least squares method Least squares estimation 3. (Nonlinear) Least squares method 3.1. Least squares estimation
Consider the linear regression model (t = 1, ··· , T ): 0 yt = xt b0 + t 0 0 where (yt , xt ) are i.i.d. (H1), the regressors are stochastic (H2), xt is the tth-row of the X matrix, and the following assumptions hold : 1 Exogeneity (H3)
E [t |xt ] = 0 for all t;
2 Spherical error terms (H4)
2 V [t |xt ] = σ0 for all t; 0 Cov [t , t0 | xt , xt0 ] = 0 for t 6= t
3 Rank condition (H5)
0 rank [E (xt xt )] = k.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 10 / 50 (Nonlinear) Least squares method Least squares estimation
Definition
The ordinary least squares estimation of b0, bols , defined from the following minimization program
T ˆ X 2 bols = argmin t b0 t=1 T X 0 2 = argmin yt − xt b0 b0 t=1 is given by
T !−1 T ! ˆ X 0 X bols = xt xt xt yt . t=1 t=1
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 11 / 50 (Nonlinear) Least squares method Least squares estimation
Proposition Under H1-H5, the ordinary least squares estimator of b is weakly consistent
p bˆols −→ b0 T →∞ and is asymptotically normally distributed √ ˆ ` 2 0−1 T bols − b0 → N 0k×1, σ0E xt xt .
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 12 / 50 (Nonlinear) Least squares method Least squares estimation
Example : AR(1) estimation
Let (Xt ) be a covariance-stationary process defined by the fundamental representation (|φ| < 1) :
Xt = φXt−1 + t
where (t ) is the innovation process of (Xt ). The ordinary least squares estimation of φ is defined to be :
T !−1 T ! ˆ X 2 X φols = xt−1 xt−1xt t=2 t=2 Moreover √ ` 2 T φˆols − φ → N 0, 1 − φ .
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 13 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation 3.2. Nonlinear least squares estimation
Definition Consider the nonlinear regression model defined by (t = 1, ··· , T ):
0 yt = h(xt ; b0) + t
where h is a nonlinear function (w.r.t. b0). The nonlinear least squares estimator of b0, bˆnls , satisfies the following minimization program :
T ˆ X 2 bnls = argmin t b0 t=1 T X 0 2 = argmin yt − h(xt ; b0) . b0 t=1
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 14 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation
Example : MA(1) estimation
Let (Xt ) be a covariance-stationary process defined by the fundamental representation (|θ| < 1) :
Xt = t + θt−1
where (t ) is the innovation process of (Xt ).
The ordinary least squares estimation of θ, which is defined by
T X 2 θˆols = argmin (xt − θt−1) , θ t=2
is not feasible (since t−1 is not observable !).
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 15 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation Example : MA(1) estimation (cont’d)
Conditionally on 0,
x1 = 1 + θ0 ⇔ 1 = x1 − θ0
x2 = 2 + θ1 ⇔ 2 = x2 − θ1
⇔ 2 = x2 − θ(x1 − θ0) . . t−2 X k t−1 xt−1 = t−1 − θt−2 ⇔ t−1 = (−θ) xt−1−k + (−θ) 0. k=0 The (conditional) nonlinear least squares estimation of θ is defined by (assuming that 0 is negligible)
T t−2 !2 X X k θˆnls = argmin xt − θ (−θ) xt−1−k θ t=2 k=0 √ ` 2 and T θˆnls − θ → N 0, 1 − θ .
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 16 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation Example : Least squares estimation using backcasting procedure
Consider the fundamental representation of a MA(q) :
Xt = ut
ut = t + θ1t−1 + ··· + θqt−q
0 (0) (0) (0) Given some initial values δ = θ1 , ··· , θq ,
Compute the unconditional residuals uˆt
uˆt = Xt !
Backcast values of t for t = −(q − 1), ··· , T using the backward recursion
(0) (0) ˜t =u ˆt − θ1 ˜t+1 − · · · − θq ˜t+q
where the q values for t beyond the estimation sample are set to zero : ˜T +1 = ··· = ˜T +q = 0, andu ˆt = 0 for t < 1.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 17 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation Example : Least squares estimation using backcasting procedure (cont’d)
Estimate the values of t using a forward recursion
(0) (0) ˆt =u ˆt − θ1 ˜t−1 − · · · − θq ˜t−q
Minimize the sum of squared residuals using the fitted values ˆt :
T X 2 (Xt − ˆt − θ1ˆt−1 − · · · − θqˆt−q) q+1
0 ˆ(1) (1) (1) and find δ = θ1 , ··· , θq . Repeat the backcast step, forward recursion, and minimization procedures until the estimates of the moving average part converge.
Remark : The backcast step can be turned off (˜−(q−1) = ··· = ˜0 = 0) and the forward recursion is only used.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 18 / 50 (Nonlinear) Least squares method Discussion 3.3. Discussion
(Linear and Nonlinear) Least squares estimation are often used in practise. These methods lead to simple algorithms (e.g., backcasting procedure for ARMA models) and can often be used to initialize more ”sophisticated” estimation methods.
These methods are opposed to exact methods (see likelihood estimation).
Nonlinear estimation requires numerical optimization algorithms :
Initial conditions ;
Number of iterations and stopping criteria ;
Local and global convergence ;
Fundamental representation and constraints.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 19 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation 4. (Generalized) Method of moments 4.1. Methods of moments and Yule-Walker estimation
Definition Suppose there is a set of k conditions
ST − g (δ) = 0k×1
k k where ST ∈ R denotes a vector of theoretical moments , δ ∈ R is a k k vector of parameters, and g : R → R defines a (bijective) mapping between ST and δ. The method of moments estimation of δ, δˆmm, is defined to be the value of δ such that SˆT − g δˆmm = 0k×1
where SˆT is the estimation (empirical counterpart) of ST .
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 20 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation
Example 1 : MA(1) estimation
Let (Xt ) be a covariance-stationary process defined by the fundamental representation (|θ| < 1) :
Xt = t + θt−1
where (t ) is the innovation process of (Xt ).
One has
2 2 γX (0) − (1 + θ )σ ST − g(δ) = 2 = 02×1 γX (1) − θσ
2 0 0 2 0 where δ = (θ, σ ) and ST = (γX (0), γX (1)) = (E Xt , E [Xt Xt−1]) .
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 21 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation
The method of moment estimation of δ, δˆmm, solves
ˆ2 2 ! ˆ ˆ γˆX (0) − (1 + θmm)ˆσ,mm ST − g(δmm) = ˆ 2 = 02×1 γˆX (1) − θmmσˆ,mm
Therefore (using the constraint |θ| < 1)
q 2 1 − 1 − 4ˆρX (1) φˆmm = 2ˆρX (1) and
2 2γX (0) σˆ,mm = . q 2 1 − 1 − 4ˆρX (1)
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 22 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation
Example 2 : Yule-Walker estimation
Let (Xt ) be a causal AR(p). The Yule-Walker equations write
p X E Xt−k Xt − φj Xt−j = 0 for all k = 0, ··· , p j=1 or
0 2 γX (0) − φ γp = σ ST − g(δ) = 0(p+1)×1 ⇔ γp − Γpφ = 0p×1
0 0 where γp = (γX (1), ··· , γX (p)) = (E [Xt Xt−1] , ··· , E [Xt Xt−p]) , 0 0 0 0 2 0 ST = (γX (0), γp) , φ = (φ1, ··· , φp) , δ = (φ , σ ) .
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 23 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation
0 2 0 The method of moment estimation of (φ , σ ) solves ˆ ˆ−1 φmm = Γp γˆp
and
2 ˆ0 σˆ,mm =γ ˆX (0) − φmmγˆp.
Question : Difference(s) with ordinary least squares estimation of an AR(p) ?
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 24 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation
Discussion
It is necessary to use theoretical moments that can be well-estimated from the data (or given the sample size).
Method of moment estimation depends on the mapping g −1 and especially the ”smoothness” of the inverse g −1 (need to avoid that the inverse map has a ”large derivative”).
Summarizing the data through the sample moments incurs a loss of information—the main exception is when the sample moments are sufficient (in a statistical sense).
Instead of reducing a sample of size T to a ”sample” of empirical moments of size k, one may reduce the sample to more ”empirical moments” than there are parameters...the so-called generalized method of moments.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 25 / 50 (Generalized) Method of moments Generalized method of moments 4.2. Generalized method of moments
Intuition : Consider the linear regression model in an i.i.d. context 0 yt = xt b0 + t
The exogeneity assumption E [t | X ] implies the k moments conditions (for t = 1, ··· , T ) 0 E xt (yt − xt b0) = 0k×1. Using the empirical counterpart (analogy principle) T 1 X x (y − x0b ) = 0 T t t t 0 k×1 t=1 This is a system of k equations with k unknowns (just-identified)
T !−1 T ! ˆ X 0 X bols = xt xt xt yt . t=1 t=1
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 26 / 50 (Generalized) Method of moments Generalized method of moments
More generally, suppose that there exists a set of instruments p zt ∈ R (with p ≥ k)—the instruments being (i) uncorrelated with the error terms and (ii) correlated with the explanatory variables—such that
0 E zt (yt − xt b0) = 0p×1 ⇔ E [h(zt ; δ0)] = 0p×1.
Using the empirical counterpart (analogy principle)
T T 1 X 1 X z (y − x0b ) = 0 ⇔ h(z ; δ ) = 0 . T t t t 0 p×1 T t 0 p×1 t=1 t=1
This system is overidentified for p > k. How can we find an estimation of b0 ?
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 27 / 50 (Generalized) Method of moments Generalized method of moments
The answer is provided by the generalized method of moments, i.e. an estimation method based on estimating equations that imposes the nullity of the expectation of a vector function of the observations and the parameters of interest b0.
The basic idea is to choose a value for δ such that the empirical counterpart is as close as possible to zero.
”As close as possible to zero” means that one minimizes the T 1 P 0 distance between T zt (yt − xt b0) and 0p×1 using a weighting t=1 matrix (say, WT ).
Two issues 1 Choice of instruments ; 2 Choice of the (optimal) weighting matrix.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 28 / 50 (Generalized) Method of moments Generalized method of moments
Definition Consider the set of moment conditions
E [h(zt , δ0)] = 0p×1
k where δ0 ∈ R , p ≥ k, and h is a p-dimensional (nonlinear) linear function of the parameters of interest and depends on (i.i.d.) observations. The generalized method of moments estimator of δ0 associated with WT (symmetric positive definite matrix) is a solution to the problem
" T #0 " T # 1 X 1 X min h(zt ; δ) WT h(zt ; δ) . δ T T t=1 t=1
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 29 / 50 (Generalized) Method of moments Generalized method of moments
Definition
The generalized method of moments estimator of δ0 solves the first-order conditions 0 " T ˆ # " T # 1 X∂h(zt ; δgmm) 1 X W h(z ; δˆ ) = 0 . T ∂δ T T t gmm k×1 t=1 t=1 This estimator is consistent and asymptotically normally distributed.
Remark : The first-order conditions correspond to k linear combinations of the moments conditions.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 30 / 50 (Generalized) Method of moments Generalized method of moments
This estimator depends on Wt . There exists a best GMM estimator, i.e. an optimal weighting matrix.
The optimal weighting matrix is given by (i.i.d. context)
∗ −1 −1 0 WT = V [h(zt , δ0)] ≡ E h(zt , δ0)h(zt , δ0) and the sample analog is
" T #−1 1 X h(z , δ )h(z , δ )0 . T t 0 t 0 t=1
It does depend on δ0 !
How can we proceed in practise ? Two-step GMM or iterated GMM method ; One-step method (Continuously updating estimator, exponential tilting estimator, empirical likelihood estimator, etc).
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 31 / 50 (Generalized) Method of moments Generalized method of moments
Example : AR(1) estimation
Let (Xt ) be a covariance-stationary process defined by the fundamental representation (|φ| < 1) :
Xt = φXt−1 + t .
Since (t ) is the innovation process of (Xt ), one has the following moment conditions
E [xt−k t ] = 0 for k > 0
⇒ Past values xt−1, ··· , xt−h can be used as instruments
Let zt denote the set of instruments
0 zt = (xt−1, ··· , xt−h) .
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 32 / 50 (Generalized) Method of moments Generalized method of moments
The moment conditions write
E [h(zt , δ0)] = 0h×1
where
h(zt , δ0) = zt (xt − φ1xt−1).
The empirical counterpart is defined to be T xt−1 1 X . (x − φ x ) = 0 . T . t 1 t−1 h×1 t=h+1 xt−h
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 33 / 50 (Generalized) Method of moments Generalized method of moments
The 2S-GMM estimator of φ is a solution to the problem
0 " T ˆ # " T # X ∂h(zt ; δgmm) X Ωˆ −1 h(z ; δˆ ) = 0 . ∂δ T t gmm h×1 t=h+1 t=h+1
ˆ −1 where ΩT is a first-step consistent estimate of the inverse of the variance-covariance matrix of the moments conditions and it accounts for the dependence of the moment conditions.
Remark : In a first step, one solves
0 " T ˆ # " T # X ∂h(zt ; δ1,gmm) X I h(z ; δˆ ) = 0 . ∂δ T t 1,gmm h×1 t=h+1 t=h+1
ˆ −1 ˆ ˆ −1 and then compute ΩT (δ1,gmm) ≡ ΩT .
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 34 / 50 (Generalized) Method of moments Generalized method of moments
Discussion
Choice of instruments (optimal instruments ?) and of the weighting matrix ;
Bias-efficiency trade-off ;
Numerical procedures can be cumbersome !
Other methods can be written as a GMM estimation.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 35 / 50 Maximum likelihood estimation Overview 5. Maximum likelihood estimation 5.1. Overview
Consider the linear regression model (t = 1, ··· , T ):
0 yt = xt b0 + t
0 0 where (yt , xt ) are i.i.d., the regressors are stochastic, xt is the tth-row of the X matrix, and
2 t | xt ∼ i.i.d.N (0, σ ).
It follows that
0 2 yt | xt ∼ i.i.d.N (xt b0, σ ).
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 36 / 50 Maximum likelihood estimation Overview
0 One observes a sample of (conditionally) i.i.d. observations (yt , xt ). These observations are realizations of random variables with a known distribution but unknown parameters.
The basic idea is to maximize the probability of observing these realizations (given a known parametric distribution with unknown parameters).
The probability of observing these realizations is provided by the joint density (or pdf) of the observations (or evaluated at the observations).
This joint density is the likelihood function, with the exception that it is a function of the unknown parameters (and not the observations !).
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 37 / 50 Maximum likelihood estimation Estimation 5.2. Estimation
Definition The exact likelihood function of the linear regression model is defined to be
T Y L(y, x; δ0) = Lt (yt , xt ; δ0) t=1
where Lt is the (exact) likelihood function for observation t
Lt (yt , xt ; δ0) = fYt |Xt (yt | xt ; δ0) × fXt (xt ).
where fYt |Xt (respectively, fXt ) is the (conditional) probability density function of Yt | Xt (respectively, of Xt ). Accordingly, the exact log-likelihood function is defined to be
T T X X `(y, x; δ0) = `t (yt , xt ; δ0) ≡ logLt (yt , xt ; δ0). t=1 t=1 Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 38 / 50 Maximum likelihood estimation Estimation
The exact likelihood function is given by T T Y Y L(y, x; δ0) = fYt |Xt (yt | xt ; δ0) fXt (xt ) t=1 t=1
The second right-hand side term is often neglected (especially if fX does not depend on δ0) and one considers the conditional likelihood function T Y L(y | x; δ0) = fYt |Xt (yt | xt ; δ0) t=1 T Y −1/2 1 2 = σ22π exp − (y − x 0b ) 2σ2 t t 0 t=1 T ! −T /2 1 X 2 = σ22π exp − (y − x 0b ) 2σ2 t t 0 t=1 and the conditional log-likelihood function is given by T T T 1 X 2 `(y | x; δ ) = − log(σ2) − log(2π) − − (y − x 0b ) . 0 2 2 2σ2 t t 0 t=1
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 39 / 50 Maximum likelihood estimation Estimation
Definition
The maximum likelihood estimator of δ0 in the linear regression model, which is the solution of the problem
maxL(y | x; δ) ⇔ max`(y | x; δ), δ δ is asymptotically unbiased and efficient, and normally distributed √ ˆ ` −1 T δml − δ0 → N 0k×1, IF where IF is the Fisher information matrix.
Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 40 / 50 Maximum likelihood estimation Estimation Example : AR(1) estimation
Consider the following AR(1) representation (|φ| < 1)
Xt = φXt−1 + t .