Basic concepts Autoregressive models Moving average models
Time Series Analysis 1. Stationary ARMA models
Andrew Lesniewski
Baruch College New York
Fall 2019
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Outline
1 Basic concepts
2 Autoregressive models
3 Moving average models
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Time series
A time series is a sequence of data points Xt indexed a discrete set of (ordered) dates t, where −∞ < t < ∞.
Each Xt can be a simple number or a complex multi-dimensional object (vector, matrix, higher dimensional array, or more general structure). We will be assuming that the times t are equally spaced throughout, and denote the time increment by h (e.g. second, day, month). Unless specified otherwise, we will be choosing the units of time so that h = 1. Typically, time series exhibit significant irregularities, which may have their origin either in the nature of the underlying quantity or imprecision in observation (or both). Examples of time series commonly encountered in finance include: (i) prices, (ii) returns, (iii) index levels, (iv) trading volums, (v) open interests, (vi) macroeconomic data (inflation, new payrolls, unemployment, GDP, housing prices, . . . )
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Time series
For modeling purposes, we assume that the elements of a time series are random variables on some underlying probability space. Time series analysis is a set of mathematical methodologies for analyzing observed time series, whose purpose is to extract useful characteristics of the data. These methodologies fall into two broad categories: (i) non-parametric, where the stochastic law of the time series is not explicitly specified; (ii) parametric, where the stochastic law of the time series is assumed to be given by a model with a finite (and preferably tractable) number of parameters. The results of time series analysis are used for various purposes such as (i) data interpretation, (ii) forecasting, (iii) smoothing, (iv) back filling, ... We begin with stationary time series.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Stationarity and ergodicity
A time series (model) is stationary, if for any times t1 < . . . < tk and any τ the
joint probability distribution of (Xt1+τ ,..., Xtk +τ ) is identical with the joint probability distribution of (Xt1 ,..., Xtk ).
In other words, the joint probability distribution of (Xt1 ,..., Xtk ) remains the same if each observation time ti is shifted by the same amount (time translation invariance).
For a stationary time series, the expected value E(Xt ) is independent of t and is called the (ensemble) mean of Xt . We will denote its value by µ. A stationary time series model is ergodic if
1 X lim Xt+k = µ, (1) T →∞ T 1≤k≤T
i.e. if the time average of Xt is equal to its mean. The limit in (1) is usually understood in the sense of squared mean convergence.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Stationarity and ergodicity
Ergodicity is a desired property of a financial time series, as we are always faced with a single realization of a process rather than an ensemble of alternative outcomes. The notions of stationarity and ergodicity are hard to verify in practice. In particular, there is practical statistical test for ergodicity. For this reason, a weaker but more practical concept of stationarity has been introduced.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autocovariance and stationarity
A time series is covariance-stationary (a.k.a. weakly stationary), if:
(i)E (Xt ) = µ is a constant, (ii) For any τ, the autocovariance Cov(Xs, Xt ) is time translation invariant,
Cov(Xs+τ , Xt+τ ) = Cov(Xs, Xt ), (2)
i.e. Cov(Xs, Xt ) depends only on the difference t − s. We will denote it by Γt−s.
For covariance stationary series, Γ−t = Γt (show it!).
Notice that Γ0 = Var(Xt ).
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autocovariance and stationarity
The autocorrelation function (ACF) of a time series is defined as
Cov(Xs, Xt ) Rs,t = p p . (3) Var(Xs) Var(Xt )
For covariance-stationary time series, Rs,t = Rs−t,0, i.e. the ACF is a function of the difference s − t only.
We will write Rt = Rt,0, and note that
Γt Rt = . (4) Γ0
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autocovariance and stationarity
Note that µ, Γ, and R are usually unknown, and are estimated from sample data. The estimated sample mean µb, autocovariance bΓ, and autocorrelation Rb are calculated as follows.
Consider a finite sample x1,..., xT . Then
T 1 X µ = xt , b T t=1 T 1 P (x − µ)(x − µ), for t = 0, 1,..., T − 1, T j b j−t b (5) bΓt = j=t+1 bΓ−t , for t = −1,..., −(T − 1).
bΓt Rbt = . bΓ0
These quantities are called the sample mean, sample autocovariance, and sample ACF, respectively.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autocovariance and stationarity
Usually, Rbt is a biased estimator of Rt , with the bias going to zero as 1/T for T → ∞. Notice that this method allows us to compute up to T − 1 estimated sample autocorrelations.
One can use the above estimators to test the hypothesis H0 : Rt = 0 versus Ha : Rt 6= 0. The relevant t-stat is Rbt r = q . 1 Pt−1 2 T 1 + 2 i=1 Rbi
If Xt is a stationary Gaussian time series with Rs = 0 for s > t, this t-stat is normally distributed, asymptotically as T → ∞.
We thus reject H0 with confidence 1 − α, if |r| > Zα/2, where Zα/2 is the 1 − α/2 percentile of the standard normal distribution.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autocovariance and stationarity
Another test, the Portmanteau test, allows us to test jointly for the presence of several autocorrelations, i.e. H0 : R1 = ... = Rk = 0, versus Ha : Ri 6= 0, for some 1 ≤ i ≤ k. The relevant t-stat is defined as
k ∗ X 2 Q (k) = T Rbi . i=1
∗ Under the assumption that Xt is i.i.d., Q (k) is asymptotically distributed according to χ2(k). The power of the test is increased if we replace the statistics above with the Ljung-Box stat: k X Rb2 Q(k) = T (T + 2) i . T − i i=1
2 H0 is rejected if Q(k) is greater than the 1 − α percentile of the χ (k) distribution.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Models of time series
For practical applications, it is convenient to model a time series as a discrete-time stochastic process with a small number of parameters. Time series models have typically the following structure:
Xt = pt + mt + εt , (6)
where the three components on the RHS have the following meaning:
pt is a periodic function called the seasonality, mt is a slowly varying process called the trend, εt is a stochastic component called the error or disturbance. Classic linear time series models fall into three broad categories: autoregressive, moving average, integrated, and their combinations.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models White noise
The source of randomness in the models discussed in these lectures is white noise. It is a process specified as follows:
Xt = εt , (7)
2 where εt ∼ N(0, σ ) are i.i.d. (= independent, identically distributed) normal random variables. Note that
E(εt ) = 0, ( σ2, if s = t, (8) Cov(εs, εt ) = 0, otherwise.
The white noise process is stationary and ergodic (show it!). The white noise process with linear drift
Xt = at + b + εt , a 6= 0, (9)
is not stationary, as E(Xt ) = at + b.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autoregressive model AR(1)
The first class of models that we consider are the autoregressive models AR(p). Their key characteristic is that the current observation is directly correlated with the lagged p observations. The simplest among them is AR(1), the autoregressive model with a single lag. The model is specified as follows:
Xt = α + βXt−1 + εt . (10)
2 Here, α, β ∈ R, and εt ∼ N(0, σ ) is a white noise. A particular case of the AR(1) model is the random walk model, namely
Xt = Xt−1 + εt ,
in which the current value of X is the previous value plus a “white noise” disturbance.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autoregressive model AR(1)
The graph below shows a simulated AR(1) time series with the following choice of parameters: α = 0.1, β = 0.3, σ = 0.005.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autoregressive model AR(1)
Here is the code snippet used to generate this graph in Python: import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.arima model import ARMA alpha=0.1 beta=0.3 sigma=0.005 #Simulate AR(1) T=250 x0=alpha/(1-beta) x=np.zeros(T+1) x[0]=x0 eps=np.random.normal(0.0,sigma,T) for i in range(1,T+1): x[i]=alpha+beta*x[i-1]+eps[i-1] #Take a look at the simulated time series plt.plot(x) plt.show()
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autoregressive model AR(1)
Let us investigate the circumstances under which an AR(1) process is covariance-stationary.
For µ = E(Xt ) to be independent of t we must have from (10):
µ = α + βµ.
This equation has a solution iff β 6= 1 (except for the random walk case corresponding to α = 0, β = 1). In this case,
α µ = . (11) 1 − β
Let us now compute the autocovariance. To this end, we rewrite (10) as
Xt − µ = β(Xt−1 − µ) + εt . (12)
Notice that the two terms on the RHS of this equation are independent of each other.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autoregressive model AR(1)
For Γ0 = Var(Xt ) to be independent of t, this implies that
2 2 Γ0 = β Γ0 + σ ,
and so σ2 Γ = . (13) 0 1 − β2
Since Γ0 > 0, this equation implies that |β| < 1.
Multiplying (12) by Xt−1 − µ, we find that Γ1 = βΓ0. Iterating, we find that
k Γk = β Γ0, (14)
with Γ0 given by (18). The autocorrelation function is decaying exponentially fast as a function of lag between two observations. In conclusion, the condition for a AR(1) process to be covariance-stationary is that |β| < 1.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autoregressive model AR(1)
The AR(1) with |β| < 1 has a natural interpretation that can be gleaned from the following “explicit” representation of Xt . Namely, iterating (10) we find that:
Xt = α + βXt−1 + εt 2 = α(1 + β) + β Xt−2 + εt + βεt−1 = ... (15) L−1 L L−1 = α(1 + β + ... + β ) + β Xt−L + εt + βεt−1 + ... + β εt−L+1 q L L 2L−1 = µ(1 − β ) + β Xt−L + Γ0(1 − β ) ξt
where ξt ∼ N(0, 1). This implies that
L L E(Xt |Xt−L) = µ(1 − β ) + β Xt−L, (16) 2L−1 Var(Xt |Xt−L) = Γ0(1 − β ).
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Autoregressive model AR(1)
Since βL → 0 exponentially fast, for large L we have
p Xt ≈ µ + Γ0 ξt . (17)
In other words, the AR(1) model describes a mean reverting time series. After a large number of observations, Xt takes the form (17), i.e. it is equal to its mean value plus a Gaussian noise. The rate of convergence to this limit is given by |β|: the smaller this value, the faster Xt reaches its limit behavior. The next question is: given a set of observations, how do we determine the values of the parameters α, β, and σ in (10)?
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Maximum likelihood estimation
Maximum likelihood estimation (MLE) is a commonly used method of estimating the parameters of a statistical model given a set of observations. It is based on the premise that the best choice of the parameter values should maximize the likelihood of making the observations given these parameters.
Given a statistical model with parameters θ = (θ1, . . . , θd ), and a set of data y = (y1,..., yN ), we construct the likelihood function L(θ|y), which links the model with the data in such a way as if the data were drawn from the assumed model. In practice, L(θ|y) is the joint probability density function (PDF) p(y|θ) under the model, evaluated at the observed values.
In particular, if the observations yi are independent, then
N Y L(θ|y) = p(yi |θ), (18) i=1
where p(yi |θ) denotes the PDF of a single observation.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Maximum likelihood estimation
The value θ∗ that maximizes L(θ|y) serves as the best fit between the model specification and the data. It is usualy more convenient to consider the log liklihood function (LLF) − log L(θ|y). Then, θ∗ is the value at which the LLF attains its minimum.
As an illustration, consider a sample y = (y1,..., yN ) drawn from the normal distribution N(µ, σ2). Its likelihood function is given by
N Y (y − µ)2 L(θ|y) = (2πσ2)−N/2 exp − i , (19) 2σ2 i=1
and the LLF is
N 1 1 X − log L(θ|y) = N log σ2 + (y − µ)2 + const. (20) 2 2σ2 i i=1
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Maximum likelihood estimation
Taking the µ and σ derivatives and setting them to 0, we readily find that that the MLE estimates of µ and σ are
N 1 X µ∗ = y , N i i=1 (21) N 1 X (σ∗)2 = (y − µ∗)2 . N i i=1
respectively. Note that, while µ∗ is unbiased, the estimator σ∗ is biased (N in the denominator above, rather than the usual N − 1). The fact that the MLE estimator of a parameter is biased is a common occurance. One can show, however, that MLE estimators are consistent, i.e. in the limit N → ∞ they converge to the appropriate value. Going forward, we will use the notation θb rather than θ∗ for the MLE estimators.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for AR(1)
Consider now the AR(1) model and a time series of data x0,..., xT , believed to be drawn from this model. The easiest way to construct the likelihood function is to focus on the conditional PDF p(x1,..., xT |x0, θ). This leads to the conditional MLE method. Let εbt = xt − α − βxt−1, (22) for t = 1,..., T , be the disturbances implied from the data. According to the 2 model specification, each εbt is independently drawn from N(0, σ ), and thus T 1 1 X 2 p(x1,..., x |x0, θ) = exp − ε T (2πσ2)T /2 2σ2 t t=1 (23) T 1 1 X 2 = exp − (xt − α − βxt−1) (2πσ2)T /2 2σ2 t=1 Hence the LLF is given by
T −1 1 2 1 X 2 − log L(θ|y) = T log σ + (x + − α − βxt ) + const. (24) 2 2σ2 t 1 t=0
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for AR(1)
Minimizing this function yields:
−1 α PT −1 PT −1 b T t=0 xt t=0 xt+1 = , β PT −1 x PT −1 x2 PT −1 x x b t=0 t t=0 t t=0 t t+1 (25) T 2 1 X 2 σ = (xt − α − βbx − ) . b T b t 1 t=1
This can also be explicitly rewritten as
PT −1 = (xt − xb)(xt+1 − xb+) βb = t 0 , PT −1 2 t=0 (xt − xb) (26) αb = xb+ − βbxb, where T −1 T −1 1 X 1 X x = xt , x+ = x + . (27) b T b T t 1 t=0 t=0
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for AR(1)
The exact MLE method attempts to infer the likelihood of x0 from the probability distribution. Since x0 ∼ N(µ, Γ0),
s 1 − β2 (x − α/(1 − β))2 p(x |θ) = exp − 0 . (28) 0 2πσ2 2σ2/(1 − β2)
On the other hand, for t = 1,..., T ,
2 1 (xt − α − βxt−1) p(xt |x − ,..., x , θ) = exp − . (29) t 1 1 2πσ2 2σ2
From the definition of conditional probability we have the following identity:
T Y p(x0, x1,..., xT |θ) = p(x0|θ) p(xt |xt−1,..., x1, θ). (30) t=1
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for AR(1)
Therefore, the LLF is given by
1 σ2 1 − log L(θ|x) = log + T log σ2 2 1 − β2 2 (31) 2 T (x0 − α/(1 − β)) 1 X 2 + + (xt − α − βx − ) + const. 2σ2/(1 − β2) 2σ2 t 1 t=1
Unlike the conditional case, the minimum of the exact LLF cannot be calculated in closed form, and the calculation has to be done by means of a numerical search.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for AR(1)
Here is a Python code snippet implementing the MLE for AR(1): #Conditional MLE estimate y=x[0:T] yp=x[1:(T+1)] m=np.sum(y)/T mp=np.sum(yp)/T betaCMLE=np.inner(y-m,yp-mp)/np.inner(y-m,y-m) alphaCMLE=mp-betaCMLE*m sigmaCMLE=np.sqrt(np.inner(yp-betaCMLE*y-alphaCMLE, yp-betaCMLE*y-alphaCMLE)/T) Alternatively, one can use statsmodels functions: #MLE estimate with statsmodels model=ARMA(x,order=(1,0)).fit(method=’mle’) alphaMLE=model.params[0] betaMLE=model.params[1] sigmaMLE=np.std(model.resid)
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Second order autoregressive model AR(2)
A second order autoregressive model AR(2) model is specified as follows:
Xt = α + β1Xt−1 + β2Xt−2 + εt , (32)
2 where α, β1, β2 ∈ R, and εt ∼ N(0, σ ) is a white noise. Under this specification, the state variable depends on its two lags (rather than one lag as in AR(1). Let us determine the conditions under which the model is covariance-stationary.
From the requirement that E(Xt ) = µ,
α µ = , (33) 1 − β1 − β2
and so we can can rewrite (32) in the following form:
Xt − µ = β1(Xt−1 − µ) + β2(Xt−2 − µ) + εt . (34)
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Second order autoregressive model AR(2)
Multiplying (34) by Xt−j − µ, for j = 0, 1, 2, and calculating expectations, we find that ( 2 β1Γ1 + β2Γ2 + σ , if k = 0, Γk = (35) β1Γk−1 + β2Γk−2, if k = 1, 2.
This identity is called the Yule-Walker equation for the autocovariance.
Dividing (57) by Γ0 yields the Yule-Walker equation for the autocorrelation:
Rk = β1Rk−1 + β2Rk−2, (36)
for k = 1, 2. This equation allows us calculate explicitly the ACF for AR(2).
Namely, plugging in k = 1 and remembering that R−1 = R1 yields R1 = β1 + β2R1, or β1 R1 = . (37) 1 − β2
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Second order autoregressive model AR(2)
Plugging in k = 2 yields R2 = β1R1 + β2, or
2 β1 R2 = β2 + . (38) 1 − β2
Finally, substituting k = 0 in (34) yields
2 Γ0 = (β1R1 + β2R2)Γ0 + σ . (39)
Solving this, we obtain
(1 − β )σ2 Γ = 2 . (40) 0 2 2 (1 + β2)((1 − β2) − β1 )
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Lag operators and characteristic roots
We have not yet addressed the question under what condition is an AR(2) time series covariance-stationary. We will now introduce the concepts that will settle this issue and will allow us to formulate criteria for stationarity for more general models, Let us define the lag operator L as a (linear) mapping:
LXt = Xt−1. (41)
In other words, the lag operator shifts the time index back by one unit. Applying the lag operator k times shifts the time index by k units:
k L Xt = Xt−k . (42)
We refer to Lk as the k-th power of L. n Finally, if ψ(z) = ψ0 + ψ1z + ... + ψnz is a polynomial in z, we associate with it an operator ψ(L) defined by
n ψ(L) = ψ0 + ψ1L + ... + ψnL . (43)
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Lag operators and characteristic roots
Notice that equation (32) can be stated as
ψ(L)Xt = α + εt , (44)
2 where ψ(z) = 1 − β1z − β2z . Solving this equation amounts to finding the inverse ψ(L)−1 of ψ(L):
α −1 Xt = + ψ(L) εt . (45) ψ(1)
Suppose that we can write ψ(L)−1 as an infinite series
∞ −1 X j ψ(L) = γj L , (46) j=0
with ∞ X |γj | < ∞. (47) j=0
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Lag operators and characteristic roots
Then ∞ α X Xt = + γ ε , (48) ψ(1) j t−j j=0
with α E(Xt ) = , (49) ψ(1) and ∞ X Cov(Xt , Xt+k ) = γj γj+k , for k ≥ 0, (50) j=0
independently of t. The series is thus covariance-stationary. In the case of AR(1), ψ(L) = 1 − βL, it is clear that the geometric series does the job: ∞ X (1 − βL)−1 = βj Lj , (51) j=0
Condition (47) holds as long as |β| < 1. Another way of saying this is that the root z1 = 1/β of 1 − βz lies outside of the unit circle.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Lag operators and characteristic roots
Now, if ψ(z) is a polynomial with non-zero roots z1,..., zn. Then
n Y −1 ψ(L) = c (1 − zj L), (52) j=1
n Qn where c is the constant c = (−1) ψn j=1 zj .
If each of the roots zj (they may be complex) lies outside of the unit circle, i.e. −1 |zj | < 1, then we can invert ψ(L) by applying (51) to each factor in (52). It is not hard to verify that the convergence criterion (47), and thus the time series is stationary. We can summarize these arguments by stating that a time series model given by the lag form equation (44) is covariance stationary if the roots of the polynomial ψ(z) lie outside of the unit circle.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models General autoregressive model AR(p)
The p-th order autoregressive model AR(p) model is specified as follows:
Xt = α + β1Xt−1 + ... + βpXt−p + εt , (53)
2 where α, βj ∈ R, and εt ∼ N(0, σ ) is a white noise.
For the covariance-stationarity, the requirement that E(Xt ) = µ yields
α µ = . (54) 1 − β1 − ... − βp
Furthermore, we require that the roots of the characteristic polynomial p ψ(z) = 1 − α − β1z − ... − βpz lie outside of the unit circle. We can rewrite (53) in the following form:
Xt − µ = β1(Xt−1 − µ) + ... + βp(Xt−p − µ) + εt . (55)
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models General autoregressive model AR(p)
Multiplying this equation by Xt−j − µ, for j = 0,..., p, and calculating expectations yields the Yule-Walker equation for the autocovariance:
( 2 β1Γ1 + ··· + βpΓp + σ , if k = 0, Γk = (56) β1Γk−1 + ... + βpΓk−p, if k = 1,..., p.
Dividing (56) by Γ0 yields the Yule-Walker equation for the autocorrelation:
Rk = β1Rk−1 + ... + βpRk−p, (57)
for k = 1,..., p. Note that the autocorrelations satisfy essentially the same equation as the process defining Xt .
The ACF Rk can be found as the solution to the Yule-Walker equation and are expressed in terms of the roots of the characteristic polynomial.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Choosing the number of lags in AR(p)
In practice, the number of lags p is unknown, and has to be determined empirically. This can be done by regressing the variable on its lagged values with p = 1, 2,..., and assessing the impact of each added lag on the fit. It is important not to overfit the model (“torture it until it confesses”) by adding too many lags. Useful quantitative guides for model selection are various information criteria. The Akaike information criterion defined as follows:
AIC = 2k − 2 log L(θb|x). (58)
Here k = #θ is the number of model parameters, − log L(θb|x) denotes the optimized value of the LLF. Acoording to this criterion, among the candidate models the model with the lowest value of AIC is the preferred one.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Choosing the number of lags in AR(p)
This is in contrast with picking the model whose optimized LLF is the lowest: this may be the result of overfitting. The AIC criterion penalizes the number of parameters, and thus discourages overfitting. Another popular information criteria is the Bayesian information criterion (a.k.a the Schwarz criterion), which is defined as follows:
BIC = log(N)k − 2 log L(θb|x), (59)
where N = #x is the number of data points. According to this criterion, the model with the smallest value of BIC is the preferred model.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Moving average model MA(1)
The moving average model MA(1) is specified as follows:
Xt = µ + εt + θεt−1., (60)
where µ and θ are constants, and εt is white noise. The key feature of the MA(1) model is that its disturbances are autocorrelated with lag 1.
The expected value of Xt is E(Xt ) = µ, (61)
as E(εt ) = µ, for all t. Its variance is
2 2 E((Xt − µ) ) = E((εt + θεt−1) ) 2 2 2 = E(εt ) + 2θE(εt εt−1) + θ E(εt−1) = (1 + θ2)σ2.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Moving average model MA(1)
For the first autocovariance, we have
E((Xt − µ)(Xt−1 − µ)) = E((εt + θεt−1)(εt−1 + θεt−2)) = θσ2.
All autocovariances with lag ≥ 2 are zero (show it!). As a result, MA(1) is (unlike AR(1)) always covariance-stationary with
(1 + θ2)σ2, if t = 0, 2 Γt = θσ , if |t| = 1, (62) 0, if |t| ≥ 2.
As a result, the first autocorrelation R1 = Γ1/Γ0 is given by
θ R = , (63) 1 1 + θ2
with all higher order autocorrelations equal zero.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Moving average model MA(1)
The graph below shows a simulated MA(1) time series with the following choice of parameters: µ = 1.1, β = 0.6, σ = 0.5.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Moving average model MA(1)
Here is the code snippet used to generate this graph in Python: import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.arima model import ARMA mu=1.1 theta=0.6 sigma=0.5 #Simulate MA(1) T=250 x0=mu x=np.zeros(T+1) x[0]=x0 eps=np.random.normal(0.0,sigma,T+1) for i in range(1,T+1): x[i]=mu+eps[i]+theta*eps[i-1] #Take a look at the simulated time series plt.plot(x) plt.show()
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for MA(1)
As in the case of AR(1), there are two natural approaches to MLE of an MA(1) model: conditional on the initial value of ε and exact. We begin with the conditional MLE method, which is somewhat easier. Since the value of ε0 cannot be calculated from the observed data, we are free to set it arbitrarily; we choose ε0 = 0. All the probabilities calculated below are conditional on this choice. We then have, for t = 1,..., T ,
εt = xt − µ − θεt−1, (64)
and so the conditional PDF of xt is 2 1 εt p(xt |xt−1,..., x1, ε0 = 0, θ) = √ exp − . (65) 2πσ2 2σ2
This expression is deceivingly simply: in reality εt is a nested function of all xs with s ≤ t. The liklihood function of the sample x1,...T is given by the product of the probabilities above, and so
T Y L(θ|x, ε0 = 0) = p(xt |xt−1,..., x1, ε0 = 0, θ), (66) t=1
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for MA(1)
The log liklihood has thus the following form:
T 1 1 X − log L(θ|x, ε = 0) = T log σ2 + ε2 + const. (67) 0 2 2σ2 t t=1
This is a quadratic function of the xt ’s. It is cumbersome to write it down explicitly, but easy to code it in a programming language. Its minimum is easiest to find by means of a numerical search.
In case of |θ| < 1, the impact of the choice ε0 = 0 phases out as we iterate through time steps. For |θ| > 1 the impact of this choice accumulates, and the method cannot be used. For the exact MLE method, we notice that the joint PDF of x is given by
1 1 p(x|θ) = exp − (x − µ)TΩ−1(x − µ) , (68) (2π)T /2 det(Ω)1/2 2
and thus 1 1 − log L(θ|x) = log det(Ω) + (x − µ)TΩ−1(x − µ). (69) 2 2
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for MA(1)
Here, Ω is a band diagonal matrix:
1 + θ2 θ 0 ... 0 2 θ 1 + θ θ . . . 0 2 2 Ω = σ 0 θ 1 + θ ... 0 (70) . . . ...... 1 + θ2.
The numerics of minimizing (69) can be handled either by (i) a clever triangular factorization of Ω, or by the Kalman filter method (we will discuss Kalman filters later in this course). Unlike the conditional MLE method, the exact method does not suffer from instabilities if |θ| ≥ 1.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models MLE for MA(1)
Here is the Python code snippet implementing the MLE for MA(1) using statsmodels: #MLE estimate with statsmodels model=ARMA(x,order=(0,1)).fit(method=’mle’) muMLE=model.params[0] thetaMLE=model.params[1] sigmaMLE=np.std(model.resid)
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models General moving average model MA(q)
A q-th order moving average model MA(q) is specified as follows:
Xt = µ + εt + θ1εt−1 + ... + θq εt−q ., (71)
where µ and θj are constants, and εt is white noise. In other words, the MA(q) model fluctuates around µ with disturbances which are autocorrelated with lag q.
The expected value of Xt is E(Xt ) = µ, (72) while its autocovariance is (1 + θ2 + ... + θ2)σ2, if j = 0, 1 q 2 Γj = (θj + θj+1θ1 + ... + θq θq−j )σ , if j = 1,..., q, (73) 0, if j > q.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models ARMA(p, q) model
A mixed autoregressive moving average model ARMA(p, q) is specified as follows:
Xt = α + β1Xt−1 + ... + βpXt−p + εt + θ1εt−1 + ... + θq εt−q , (74)
where α and βj , θk are constants, and εt is white noise. The equation above has the following lag operator representation:
ψ(L)Xt = α + ϕ(L)εt , (75)
where
p ψ(z) = 1 − β1z − ... − βpz , q (76) ϕ(z) = 1 + θ1z + ... + θq z .
The process (49) is covariance stationary if the roots of ψ lie outside of the unit circle.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models ARMA(p, q) model
In this case, we can write the model in the form
Xt = µ + γ(L)εt , (77)
where µ = α/ψ(1), and γ(L) = ψ(L)−1ϕ(L). Explicitly, γ(L) is an infinite series:
∞ X j γ(L) = γj L , (78) j=0
with ∞ X 2 |γj | < ∞. (79) j=0
This form of the model specification is called the moving average form. The parameters ARMA models are estimated by means of the MLE method. The complexity of computation required to minimize the LLF increases with the number of parameters. Information criteria, such as AIC or BIC, remain useful quantitative guides for model selection.
A. Lesniewski Time Series Analysis Basic concepts Autoregressive models Moving average models Forecasting time series with ARMA(p, q)
An important function of time series analysis is making predictions about future values of the observed data, i.e. forecasting. Data based forecasting problem can be formulated as follows: given the ∗ observations X1:t = X1,..., Xt , what is the best forecast Xt+1|1:t of Xt+1? In mathematical terms, the problem requires minimizing a suitable loss function. We choose to minimize the mean squared error (MSE) given by