Lecture 4: Estimation of ARIMA models

Florian Pelgrin

University of Lausanne, Ecole´ des HEC Department of mathematics (IMEA-Nice)

Sept. 2011 - Dec. 2011

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 1 / 50 Road map

1 Introduction Overview Framework

2 Sample moments

3 (Nonlinear) Least squares method Least squares estimation Nonlinear least squares estimation Discussion

4 (Generalized) Method of moments Methods of moments and Yule-Walker estimation Generalized method of moments

5 Maximum likelihood estimation Overview Estimation

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 2 / 50 Introduction Overview 1. Introduction 1.1. Overview

Provide an overview of some estimation methods for linear time series models :

Sample moments estimation ;

(Nonlinear) Linear least squares method ;

Maximum likelihood estimation

(Generalized) Method of moments Other methods are available (e.g., Bayesian estimation or Kalman filtering through state-space models) !

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 3 / 50 Introduction Overview

Broadly speaking, these methods consist in estimating the parameters of interest (autoregressive coefficients, moving average coefficients, and variance of the innovation process) as follows :

Optimize QT (δ) δ s.t. (nonlinear) linear equality and inequality constraints

where QT is the objective function and δ is the vector of parameters of interest.

Remark : (Nonlinear) Linear equality and inequality constraints may represent the fundamentalness conditions.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 4 / 50 Introduction Overview

Examples

(Nonlinear) Linear least squares methods aims at minimizing the sum of squared residuals ;

Maximum likelihood estimation aims at maximizing the (log-) likelihood function ;

Generalized method of moments aims at minimizing the distance between the theoretical moments and zero (using a weighting matrix).

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 5 / 50 Introduction Overview

These methods differ in terms of :

Assumptions ;

”Nature” of the model (e.g., MA versus AR) ;

Finite and large sample properties ;

Etc.

High quality software programs (Eviews, SAS, Splus, Stata, etc) are available ! However, it is important to know the estimation options (default procedure, optimization algorithm, choice of initial conditions) and to keep in mind that all these estimation techniques do not perform equally and do depend on the nature of the model...

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 6 / 50 Introduction Framework 1.2. Framework

Suppose that (Xt ) is correctly specified as an ARIMA(p,d,q) model

d Φ(L)∆ Xt = Θ(L)t

2 where t is a weak white noise (0, σ ) and

p Φ(L) = 1 − φ1L − · · · − φpL with φp 6= 0 q Θ(L) = 1 + θ1L + ··· + θqL with θq 6= 0.

Assume that 1 The model order (p,d, and q) is known ; 2 The data has zero mean ; 3 The series is weakly stationary (e.g., after d-difference transformation, etc).

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 7 / 50 Sample moments 2. Sample moments

Let (X1, ··· , XT ) represent a sample of size T from a covariance-stationary process with

E [Xt ] = mX for all t Cov [Xt , Xt−h] = γX (h) for all t

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 8 / 50 Sample moments If nothing is known about the distribution of the time series, (non-parametric) estimation can be provided by : Mean

T 1 X mˆ ≡ X¯ = X X T T t t=1 Autocovariance

T −|h| 1 X γˆ (h) = (X − X¯ )(X − X¯ ) X T t T t+|h| T t=1 or

T −|h| 1 X γˆ (h) = (X − X¯ )(X − X¯ ). X T − |h| t T t+|h| T t=1 Autocorrelation

γˆX (h) ρˆX (h) = . γˆX (0)

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 9 / 50 (Nonlinear) Least squares method Least squares estimation 3. (Nonlinear) Least squares method 3.1. Least squares estimation

Consider the model (t = 1, ··· , T ): 0 yt = xt b0 + t 0 0 where (yt , xt ) are i.i.d. (H1), the regressors are stochastic (H2), xt is the tth-row of the X matrix, and the following assumptions hold : 1 Exogeneity (H3)

E [t |xt ] = 0 for all t;

2 Spherical error terms (H4)

2 V [t |xt ] = σ0 for all t; 0 Cov [t , t0 | xt , xt0 ] = 0 for t 6= t

3 Rank condition (H5)

0 rank [E (xt xt )] = k.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 10 / 50 (Nonlinear) Least squares method Least squares estimation

Definition

The ordinary least squares estimation of b0, bols , defined from the following minimization program

T ˆ X 2 bols = argmin t b0 t=1 T X 0 2 = argmin yt − xt b0 b0 t=1 is given by

T !−1 T ! ˆ X 0 X bols = xt xt xt yt . t=1 t=1

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 11 / 50 (Nonlinear) Least squares method Least squares estimation

Proposition Under H1-H5, the ordinary least squares of b is weakly consistent

p bˆols −→ b0 T →∞ and is asymptotically normally distributed √ ˆ  `  2  0−1 T bols − b0 → N 0k×1, σ0E xt xt .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 12 / 50 (Nonlinear) Least squares method Least squares estimation

Example : AR(1) estimation

Let (Xt ) be a covariance-stationary process defined by the fundamental representation (|φ| < 1) :

Xt = φXt−1 + t

where (t ) is the innovation process of (Xt ). The ordinary least squares estimation of φ is defined to be :

T !−1 T ! ˆ X 2 X φols = xt−1 xt−1xt t=2 t=2 Moreover √   ` 2 T φˆols − φ → N 0, 1 − φ .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 13 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation 3.2. Nonlinear least squares estimation

Definition Consider the model defined by (t = 1, ··· , T ):

0 yt = h(xt ; b0) + t

where h is a nonlinear function (w.r.t. b0). The nonlinear least squares estimator of b0, bˆnls , satisfies the following minimization program :

T ˆ X 2 bnls = argmin t b0 t=1 T X 0 2 = argmin yt − h(xt ; b0) . b0 t=1

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 14 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation

Example : MA(1) estimation

Let (Xt ) be a covariance-stationary process defined by the fundamental representation (|θ| < 1) :

Xt = t + θt−1

where (t ) is the innovation process of (Xt ).

The ordinary least squares estimation of θ, which is defined by

T X 2 θˆols = argmin (xt − θt−1) , θ t=2

is not feasible (since t−1 is not observable !).

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 15 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation Example : MA(1) estimation (cont’d)

Conditionally on 0,

x1 = 1 + θ0 ⇔ 1 = x1 − θ0

x2 = 2 + θ1 ⇔ 2 = x2 − θ1

⇔ 2 = x2 − θ(x1 − θ0) . . t−2 X k t−1 xt−1 = t−1 − θt−2 ⇔ t−1 = (−θ) xt−1−k + (−θ) 0. k=0 The (conditional) nonlinear least squares estimation of θ is defined by (assuming that 0 is negligible)

T t−2 !2 X X k θˆnls = argmin xt − θ (−θ) xt−1−k θ t=2 k=0 √   ` 2 and T θˆnls − θ → N 0, 1 − θ .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 16 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation Example : Least squares estimation using backcasting procedure

Consider the fundamental representation of a MA(q) :

Xt = ut

ut = t + θ1t−1 + ··· + θqt−q

0 (0)  (0) (0) Given some initial values δ = θ1 , ··· , θq ,

Compute the unconditional residuals uˆt

uˆt = Xt !

Backcast values of t for t = −(q − 1), ··· , T using the backward recursion

(0) (0) ˜t =u ˆt − θ1 ˜t+1 − · · · − θq ˜t+q

where the q values for t beyond the estimation sample are set to zero : ˜T +1 = ··· = ˜T +q = 0, andu ˆt = 0 for t < 1.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 17 / 50 (Nonlinear) Least squares method Nonlinear least squares estimation Example : Least squares estimation using backcasting procedure (cont’d)

Estimate the values of t using a forward recursion

(0) (0) ˆt =u ˆt − θ1 ˜t−1 − · · · − θq ˜t−q

Minimize the sum of squared residuals using the fitted values ˆt :

T X 2 (Xt − ˆt − θ1ˆt−1 − · · · − θqˆt−q) q+1

 0 ˆ(1) (1) (1) and find δ = θ1 , ··· , θq . Repeat the backcast step, forward recursion, and minimization procedures until the estimates of the moving average part converge.

Remark : The backcast step can be turned off (˜−(q−1) = ··· = ˜0 = 0) and the forward recursion is only used.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 18 / 50 (Nonlinear) Least squares method Discussion 3.3. Discussion

(Linear and Nonlinear) Least squares estimation are often used in practise. These methods lead to simple algorithms (e.g., backcasting procedure for ARMA models) and can often be used to initialize more ”sophisticated” estimation methods.

These methods are opposed to exact methods (see likelihood estimation).

Nonlinear estimation requires numerical optimization algorithms :

Initial conditions ;

Number of iterations and stopping criteria ;

Local and global convergence ;

Fundamental representation and constraints.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 19 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation 4. (Generalized) Method of moments 4.1. Methods of moments and Yule-Walker estimation

Definition Suppose there is a set of k conditions

ST − g (δ) = 0k×1

k k where ST ∈ R denotes a vector of theoretical moments , δ ∈ R is a k k vector of parameters, and g : R → R defines a (bijective) mapping between ST and δ. The method of moments estimation of δ, δˆmm, is defined to be the value of δ such that   SˆT − g δˆmm = 0k×1

where SˆT is the estimation (empirical counterpart) of ST .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 20 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation

Example 1 : MA(1) estimation

Let (Xt ) be a covariance-stationary process defined by the fundamental representation (|θ| < 1) :

Xt = t + θt−1

where (t ) is the innovation process of (Xt ).

One has

 2 2  γX (0) − (1 + θ )σ ST − g(δ) = 2 = 02×1 γX (1) − θσ

2 0 0  2 0 where δ = (θ, σ ) and ST = (γX (0), γX (1)) = (E Xt , E [Xt Xt−1]) .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 21 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation

The method of moment estimation of δ, δˆmm, solves

ˆ2 2 ! ˆ ˆ γˆX (0) − (1 + θmm)ˆσ,mm ST − g(δmm) = ˆ 2 = 02×1 γˆX (1) − θmmσˆ,mm

Therefore (using the constraint |θ| < 1)

q 2 1 − 1 − 4ˆρX (1) φˆmm = 2ˆρX (1) and

2 2γX (0) σˆ,mm = . q 2 1 − 1 − 4ˆρX (1)

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 22 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation

Example 2 : Yule-Walker estimation

Let (Xt ) be a causal AR(p). The Yule-Walker equations write

  p  X E Xt−k Xt − φj Xt−j  = 0 for all k = 0, ··· , p j=1 or

 0 2 γX (0) − φ γp = σ ST − g(δ) = 0(p+1)×1 ⇔ γp − Γpφ = 0p×1

0 0 where γp = (γX (1), ··· , γX (p)) = (E [Xt Xt−1] , ··· , E [Xt Xt−p]) , 0 0 0 0 2 0 ST = (γX (0), γp) , φ = (φ1, ··· , φp) , δ = (φ , σ ) .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 23 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation

0 2 0 The method of moment estimation of (φ , σ ) solves ˆ ˆ−1 φmm = Γp γˆp

and

2 ˆ0 σˆ,mm =γ ˆX (0) − φmmγˆp.

Question : Difference(s) with ordinary least squares estimation of an AR(p) ?

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 24 / 50 (Generalized) Method of moments Methods of moments and Yule-Walker estimation

Discussion

It is necessary to use theoretical moments that can be well-estimated from the data (or given the sample size).

Method of moment estimation depends on the mapping g −1 and especially the ”smoothness” of the inverse g −1 (need to avoid that the inverse map has a ”large derivative”).

Summarizing the data through the sample moments incurs a loss of information—the main exception is when the sample moments are sufficient (in a statistical sense).

Instead of reducing a sample of size T to a ”sample” of empirical moments of size k, one may reduce the sample to more ”empirical moments” than there are parameters...the so-called generalized method of moments.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 25 / 50 (Generalized) Method of moments Generalized method of moments 4.2. Generalized method of moments

Intuition : Consider the linear regression model in an i.i.d. context 0 yt = xt b0 + t

The exogeneity assumption E [t | X ] implies the k moments conditions (for t = 1, ··· , T )  0  E xt (yt − xt b0) = 0k×1. Using the empirical counterpart (analogy principle) T 1 X x (y − x0b ) = 0 T t t t 0 k×1 t=1 This is a system of k equations with k unknowns (just-identified)

T !−1 T ! ˆ X 0 X bols = xt xt xt yt . t=1 t=1

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 26 / 50 (Generalized) Method of moments Generalized method of moments

More generally, suppose that there exists a set of instruments p zt ∈ R (with p ≥ k)—the instruments being (i) uncorrelated with the error terms and (ii) correlated with the explanatory variables—such that

 0  E zt (yt − xt b0) = 0p×1 ⇔ E [h(zt ; δ0)] = 0p×1.

Using the empirical counterpart (analogy principle)

T T 1 X 1 X z (y − x0b ) = 0 ⇔ h(z ; δ ) = 0 . T t t t 0 p×1 T t 0 p×1 t=1 t=1

This system is overidentified for p > k. How can we find an estimation of b0 ?

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 27 / 50 (Generalized) Method of moments Generalized method of moments

The answer is provided by the generalized method of moments, i.e. an estimation method based on estimating equations that imposes the nullity of the expectation of a vector function of the observations and the parameters of interest b0.

The basic idea is to choose a value for δ such that the empirical counterpart is as close as possible to zero.

”As close as possible to zero” means that one minimizes the T 1 P 0 distance between T zt (yt − xt b0) and 0p×1 using a weighting t=1 matrix (say, WT ).

Two issues 1 Choice of instruments ; 2 Choice of the (optimal) weighting matrix.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 28 / 50 (Generalized) Method of moments Generalized method of moments

Definition Consider the set of moment conditions

E [h(zt , δ0)] = 0p×1

k where δ0 ∈ R , p ≥ k, and h is a p-dimensional (nonlinear) linear function of the parameters of interest and depends on (i.i.d.) observations. The generalized method of moments estimator of δ0 associated with WT (symmetric positive definite matrix) is a solution to the problem

" T #0 " T # 1 X 1 X min h(zt ; δ) WT h(zt ; δ) . δ T T t=1 t=1

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 29 / 50 (Generalized) Method of moments Generalized method of moments

Definition

The generalized method of moments estimator of δ0 solves the first-order conditions 0 " T ˆ # " T # 1 X∂h(zt ; δgmm) 1 X W h(z ; δˆ ) = 0 . T ∂δ T T t gmm k×1 t=1 t=1 This estimator is consistent and asymptotically normally distributed.

Remark : The first-order conditions correspond to k linear combinations of the moments conditions.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 30 / 50 (Generalized) Method of moments Generalized method of moments

This estimator depends on Wt . There exists a best GMM estimator, i.e. an optimal weighting matrix.

The optimal weighting matrix is given by (i.i.d. context)

∗ −1 −1  0 WT = V [h(zt , δ0)] ≡ E h(zt , δ0)h(zt , δ0) and the sample analog is

" T #−1 1 X h(z , δ )h(z , δ )0 . T t 0 t 0 t=1

It does depend on δ0 !

How can we proceed in practise ? Two-step GMM or iterated GMM method ; One-step method (Continuously updating estimator, exponential tilting estimator, empirical likelihood estimator, etc).

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 31 / 50 (Generalized) Method of moments Generalized method of moments

Example : AR(1) estimation

Let (Xt ) be a covariance-stationary process defined by the fundamental representation (|φ| < 1) :

Xt = φXt−1 + t .

Since (t ) is the innovation process of (Xt ), one has the following moment conditions

E [xt−k t ] = 0 for k > 0

⇒ Past values xt−1, ··· , xt−h can be used as instruments

Let zt denote the set of instruments

0 zt = (xt−1, ··· , xt−h) .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 32 / 50 (Generalized) Method of moments Generalized method of moments

The moment conditions write

E [h(zt , δ0)] = 0h×1

where

h(zt , δ0) = zt (xt − φ1xt−1).

The empirical counterpart is defined to be   T xt−1 1 X  .  (x − φ x ) = 0 . T  .  t 1 t−1 h×1 t=h+1 xt−h

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 33 / 50 (Generalized) Method of moments Generalized method of moments

The 2S-GMM estimator of φ is a solution to the problem

0 " T ˆ # " T # X ∂h(zt ; δgmm) X Ωˆ −1 h(z ; δˆ ) = 0 . ∂δ T t gmm h×1 t=h+1 t=h+1

ˆ −1 where ΩT is a first-step consistent estimate of the inverse of the variance-covariance matrix of the moments conditions and it accounts for the dependence of the moment conditions.

Remark : In a first step, one solves

0 " T ˆ # " T # X ∂h(zt ; δ1,gmm) X I h(z ; δˆ ) = 0 . ∂δ T t 1,gmm h×1 t=h+1 t=h+1

ˆ −1 ˆ ˆ −1 and then compute ΩT (δ1,gmm) ≡ ΩT .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 34 / 50 (Generalized) Method of moments Generalized method of moments

Discussion

Choice of instruments (optimal instruments ?) and of the weighting matrix ;

Bias-efficiency trade-off ;

Numerical procedures can be cumbersome !

Other methods can be written as a GMM estimation.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 35 / 50 Maximum likelihood estimation Overview 5. Maximum likelihood estimation 5.1. Overview

Consider the linear regression model (t = 1, ··· , T ):

0 yt = xt b0 + t

0 0 where (yt , xt ) are i.i.d., the regressors are stochastic, xt is the tth-row of the X matrix, and

2 t | xt ∼ i.i.d.N (0, σ ).

It follows that

0 2 yt | xt ∼ i.i.d.N (xt b0, σ ).

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 36 / 50 Maximum likelihood estimation Overview

0 One observes a sample of (conditionally) i.i.d. observations (yt , xt ). These observations are realizations of random variables with a known distribution but unknown parameters.

The basic idea is to maximize the probability of observing these realizations (given a known parametric distribution with unknown parameters).

The probability of observing these realizations is provided by the joint density (or pdf) of the observations (or evaluated at the observations).

This joint density is the likelihood function, with the exception that it is a function of the unknown parameters (and not the observations !).

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 37 / 50 Maximum likelihood estimation Estimation 5.2. Estimation

Definition The exact likelihood function of the linear regression model is defined to be

T Y L(y, x; δ0) = Lt (yt , xt ; δ0) t=1

where Lt is the (exact) likelihood function for observation t

Lt (yt , xt ; δ0) = fYt |Xt (yt | xt ; δ0) × fXt (xt ).

where fYt |Xt (respectively, fXt ) is the (conditional) probability density function of Yt | Xt (respectively, of Xt ). Accordingly, the exact log-likelihood function is defined to be

T T X X `(y, x; δ0) = `t (yt , xt ; δ0) ≡ logLt (yt , xt ; δ0). t=1 t=1 Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 38 / 50 Maximum likelihood estimation Estimation

The exact likelihood function is given by T T Y Y L(y, x; δ0) = fYt |Xt (yt | xt ; δ0) fXt (xt ) t=1 t=1

The second right-hand side term is often neglected (especially if fX does not depend on δ0) and one considers the conditional likelihood function T Y L(y | x; δ0) = fYt |Xt (yt | xt ; δ0) t=1 T   Y −1/2 1 2 = σ22π exp − (y − x 0b )  2σ2 t t 0 t=1  T ! −T /2 1 X 2 = σ22π exp − (y − x 0b )  2σ2 t t 0  t=1 and the conditional log-likelihood function is given by T T T 1 X 2 `(y | x; δ ) = − log(σ2) − log(2π) − − (y − x 0b ) . 0 2  2 2σ2 t t 0  t=1

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 39 / 50 Maximum likelihood estimation Estimation

Definition

The maximum likelihood estimator of δ0 in the linear regression model, which is the solution of the problem

maxL(y | x; δ) ⇔ max`(y | x; δ), δ δ is asymptotically unbiased and efficient, and normally distributed √ ˆ  ` −1 T δml − δ0 → N 0k×1, IF where IF is the Fisher information matrix.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 40 / 50 Maximum likelihood estimation Estimation Example : AR(1) estimation

Consider the following AR(1) representation (|φ| < 1)

Xt = φXt−1 + t .

2 where t or t |Xt−1 ∼ i.i.d.N 0, σ Therefore

2 Xt | Xt−1 = xt−1 ∼ N φxt−1, σ  σ2  X ∼ N 0,  1 1 − φ2 Accordingly,

−1/2  1  f (x | x ; δ ) = σ22π exp − (x − φx )2 Xt |Xt−1 t t−1 0  2 t t−1 2σ  2 −1/2  2  σ 1 − φ 2 fX1 (x1; δ0) = 2 2π exp − 2 x1 . 1 − φ 2σ

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 41 / 50 Maximum likelihood estimation Estimation Exact (log-) likelihood function The exact likelihood function is given by T Y L(x; δ0) = L1(x1; δ0) × Lτ (xτ | xτ−1; δ0) τ=2 T Y = fX1 (x1; δ0) × fXτ |Xτ−1 (xτ | xτ−1; δ0) τ=2  2 −1/2  2  σ 1 − φ 2 = 2 2π exp − 2 x1 1 − φ 2σ T ! T −1 − 1 X 2 × σ22π 2 exp − (x − φx ) .  2σ2 t t−1  t=2 The exact log-likelihood is  2  2 T 1 σ 1 − φ 2 `(x; δ0) = − log(2π) − log 2 − 2 x1 2 2 1 − φ 2σ T T − 1 1 X − log(σ2) − (x − φx )2. 2  2σ2 t t−1  t=2 Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 42 / 50 Maximum likelihood estimation Estimation Conditional (log-) likelihood function The conditional likelihood function is given by

T Y L(x | x−1; δ0) = Lτ (xτ | xτ−1; δ0) τ=2 T Y = fXτ |Xτ−1 (xτ | xτ−1; δ0) τ=2 T ! − T −1 1 X = σ22π 2 exp − (x − φx )2 .  2σ2 t t−1  t=2 The conditional log-likelihood is T − 1 T − 1 `(x | x ; δ ) = − log(2π) − log(σ2) −1 0 2 2  T 1 X − (x − φx )2. 2σ2 t t−1  t=2

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 43 / 50 Maximum likelihood estimation Estimation

The two methods lead to different maximum likelihood estimates

δˆcml 6= δˆeml

The conditional maximum likelihood estimator of φ is also the ordinary least squares estimator of φ

φˆcml = φˆols

This also holds for any AR(p).

The Yule-Walker, conditional maximum likelihood and (ordinary) least squares estimator are asymptotically equivalent.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 44 / 50 Maximum likelihood estimation Estimation Discussion Generally the maximum likelihood estimates are built as if the observations are Gaussian, though it is not necessary that (Xt ) is Gaussian when doing the estimation.

If the model is misspecified, the only difference (to some extent...) is that estimates may be less efficient. In this case, one can use the pseudo- or quasi-maximum likelihood estimation.

In the case of linear models, there might be not so much gain of using exact maximum likelihood against conditional likelihood method (or least squares method).

Maximum likelihood estimation often requires nonlinear numerical optimization procedures.

Estimation of the Fisher information matrix can be cumbersome (especially in finite samples).

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 45 / 50 Appendix Appendix : ML estimation of a MA(q)

0 Step 1 : Writing  = (1, ··· , T ) One has

1−q = 1−q . . . = .

0 = 0

1 = x1 − θ11 − · · · − θq1−q

1 = x2 − θ12 − · · · − θq2−q . . . = .

T = xT − θ1T − · · · − θqT −q

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 46 / 50 Appendix

Therefore

 = NX + Z∗    = ZN  ∗ X

0 0 with X = (x1, ··· , xT ) , ∗ = (1−q, 2−q, ··· , −1, 0) , and

 0   I  N = q×T , Z = q A1(θ) A2(θ) where A1(θ) is a lower triangular matrix (T × T ), A2(θ) is a T × q matrix, and Iq is the identity matrix of order q.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 47 / 50 Appendix

Step 2 : Determination of the joint density of  The joint probability density function of  is defined to be

T +q   2− 2 1 0 2πσ exp − 2 (NX + Z∗) (NX + Z∗) . 2σ

Step 3 : Determination of the marginal density of X The marginal density of X is defined to be

− T 1  1  2 2 0 − 2 0 2πσ detX X exp − 2 (NX + Zˆ∗) (NX + Zˆ∗) 2σ

0 −1 0 with ˆ∗ = −(Z Z) Z NX .

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 48 / 50 Appendix

Step 4 : Determination of the log-likelihood funcion The log-likelihood function is defined to be

T T 2 T 0  S(θ) ` (x; δ) = − log(2π) − log(σ ) − log detX X − 2 2 2 2 2σ

0 2 0 where δ = (θ , σ ) and

0 S(θ) = (NX + Zˆ∗) (NX + Zˆ∗) .

2 Step 5 : Determination ofσ ˆ,ml 2 Using the first-order condition w.r.t. σ , one has   S θˆml σˆ2 = . ,ml T

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 49 / 50 Appendix

Step 6 : Determination of the concentrated log-likelihood funcion The concentrated log-likelihood function is defined to be

`∗(x; θ) = −T log [S(θ)] − log detX 0X  .

Step 7 : Determination of θˆml

θˆml solves

min T log [S(θ)] + log detX 0X  . θ Remarks : 1 Least squares methods only focus on the first right-hand side term of the log-likelihood function. Exact methods account for both right-hand side terms. 2 This method generalizes to ARMA(p,q) models.

Florian Pelgrin (HEC) Univariate time series Sept. 2011 - Dec. 2011 50 / 50