Generalized Autoregressive Score Models By: Drew Creal, Siem Jan Koopman, André Lucas
Total Page:16
File Type:pdf, Size:1020Kb
Econometrics Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas To capture the dynamic behavior of univariate and multivariate time series processes, we can allow parameters to be time-varying by having them as functions of lagged dependent variables as well as exogenous variables. Although other approaches of introducing time dependence exists, the GAS models, Generalized Autoregressive Score, particular approach have become popular in applied statistics and econometrics. Here we discuss a further development of Creal, Koopman, and Lucas (2012) which is based on the score function of the predictive model density at time t. Typical examples are the generalized autoregressive that time-varying parameters in a multi-state model for conditional heteroskedasticity (GARCH) models of pooled marked point-processes can be introduced natu- Engle (1982) and Bollerslev (1986), the autoregressive rally in our framework. conditional duration and intensity (ACD and ACI, respectively) models of Engle and Russell (1998) and the dynamic copula models of Patton (2006). Creal, The GAS model Koopman, and Lucas (2012) argue that the score Let N × 1 vector yt denote the dependent variable of function is an effective choice for introducing a driving interest, ft the time-varying parameter vector, xt a vector mechanism for time-varying parameters. In particular, of exogenous variables (covariates), all at time t, and θ t by scaling the score function appropriately, standard a vector of static parameters. Define Y = {y1, . , yt}, t t dynamic models such as the GARCH, ACD, and ACI F = {f0, f1, . , ft}, and X = {x1, . , xt}. The available models can be recovered. Application of this framework information set at time t consists of ft, t where (1) to other non-linear, non-Gaussian, possibly multivariate, { F } models will lead to the formulation of new time-varying t 1 t 1 t = Y − ,F− l, X , for t =1,...,n. (1) parameter models. Ft { } They have labeled their model as the generalized auto- regressive score (GAS) model. Here we aim to introduce We assume that yt is generated by the observation density the GAS model and to illustrate the approach for a class y p(y f , ; θ). (1)(1) of multivariate point-process models that is used empiri- t ∼ t | t Ft cally for the modeling credit risk. We further aim to show Furthermore, we assume that the mechanism for updating Siem Jan Koopman the time-varying parameter ft is given by the familiar autoregressive updating equation Prof. Dr. Siem Jan Koopman is Professor of Econometrics at the Vrije Universiteit p q Amsterdam and research fellow at the Tinbergen ft+1 = ω + Aist i+1 + Bjft j+1, (2) − − (2) Institute since 1999. His Ph.D. is from the London i=1 j=1 School of Economics (LSE) and dates back to 1992. He had positions at the LSE between 1992 where ω is a vector of constants, coefficient matrices(1) A and 1997 and at the CentER (Tilburg University) i and B have appropriate dimensions for i = 1, . , p and between 1997 and 1999. His research interests j are Statistical analysis of time series; Time series j = 1, . , q, while st is an appropriate function of past s = s (y ,f , ; θ) econometrics; Financial econometrics; Kalman data, t t t t F t . The unknown coefficients in filter; Simulation-based estimation; Forecasting. (2) are functions of θ, that is ω = ω(θ), Ai = Ai(θ), and Bj (1) (1) = Bj(θ) for i = 1, . , p and j = 1, . , q. AENORM vol. 20 (75) May 2012 37 1 1 1 1 1 1 1 Econometrics Econometrics The approach is based on the observation density (1) where t = ln p(yt ft, t; θ) for a realization of y . Eva- | F t for a given parameter ft. When observation yt is realized, luating the log-likelihood function of the GAS model is time-varying ft to the next period t + 1 is updated using particularly simple. It only requires the implementation (2) with of the GAS updating equation (2) and the evaluation of * ∂ ln p(yt ft , t ; θ) t for a particular value θ of θ. st = St t, t = | F , ·∇ ∇ ∂ft (3) Example : GARCH models Consider the basic model yt = σtεt where the Gaussian disturbance εt has zero mean St = S(t, ft , t ; θ), F and unit variance while σt is a time-varying standard de- where S(·) is a matrix function. Given the dependence viation. It is a basic exercise to show that the GAS (1, 1) 1 2 of the driving mechanism in (2) on the scaled score model with St = − and f = σ reduces to It t 1 t t vector (3), the equations (1) – (3) define the generalized | − f = ω + A y2 f + B f , (7) autoregressive score model with orders p and q. We t+1 1 t − t 1 t refer to the model as GAS (p, q) and we typically take p = q = 1. which is equivalent to the standard GARCH(1, 1) model The use of the score for updating ft is intuitive. It as given by defines a steepest ascent direction for improving the f = α + α y2 + β f ,f= σ2, (8) model’s local fit in terms of the likelihood or density t+1 0 1 t 1 t t t at time t given the current position of the parameter ft. This provides the natural direction for updating where coefficients α0 = ω, α1 = A1 and β1 = B1 − A1 are (1) the parameter. In addition, the score depends on the unknown. When we assume that εt follows a Student’s t complete density, and not only on the first or second distribution with ν degrees of freedom and unit variance, order moments of the observations yt. Via its choice the GAS (1, 1) specification for the conditional variance of the scaling matrix St, the GAS model allows for leads to the updating equation additional flexibility in how the score is used for 1 updating f . In many situations, it is natural to consider ft+1 = ω + A1 1+3ν− t · · a form of scaling that depends on the variance of the (1 + ν 1) − 2 (9) score. For example, we can define the scaling matrix 1 1 2 1 yt ft (1 2ν− )(1 + ν− y /(1 2ν− ) ft) − as − t − (4) +B1ft. 1 St = t−t 1, t t 1 =Et 1 [ t t ] , I | − I | − − ∇ ∇ This model is clearly different compatered to the standard where Et−1 is expectation with respect to the density t-GARCH(1, 1) model which has the Student’s t density p(y f , ; θ). For this choice of S , the GAS model in (1) with the updating equation (7). The denominator t| t Ft t encompasses well-known models such as GARCH, ACD of the second term in the right-hand side of (9) causes a and ACI. Another possibility for scaling is more moderate increase in the variance for a large (5) realization of |y | as long as ν is finite. The intuition is 1 t St = t t 1, t t 1 t t 1 = t−t 1, clear: if the errors are modeled by a fattailed distribution, J | − J | − J | − I | − a large absolute realization of yt does not necessitate a where St is defined as the square root matrix of the substantial increase in the variance. Multivariate exten- (pseudo)-inverse information matrix for (1) with sions of this approach are developed in Creal, Koopman, respect to ft. An advantage of this specific choice for and Lucas (2011). St is that the statistical properties of the corresponding GAS model become more tractable. In particular, the Example : Regression model The time-varying linear driver st becomes a martingale difference with unity regression model yt = xt′βt + εt has a k × 1 vector xt of variance. exogenous variables, a k × 1 vector of time-varying re- A convenient property of the GAS model is the gression coefficients βt and normally independently dis- 2 relatively simple way of estimating parameters by tributed disturbances εt ~ N(0, σ ). Let ft = βt. The scaled maximum likelihood (ML). This feature applies to all score function based on St = t t 1 in for this regres- J | − special cases of GAS models. For an observed time sion model is given by series y1, . , yn and by adopting the standard prediction 1/2 s =(x x )− x (y x f )/σ, (10) error decomposition, we can express the maximization t t t t t − t t problem as where the inverse t t 1 of used to construct t t 1 is I | − J | − n the Moore-Penrose pseudo inverse to account for the ˆ θ = arg max t, (6) singularity of x x ′. The GAS (1, 1) specification for the θ t t t=1 time-varying regression coefficient becomes 38 AENORM vol. 20 (75) May 2012 1 Econometrics Econometrics denote the last event time before time t and let λk,t = (λ1k,t, xt (yt xt ft) ft+1 = ω + A1 − + B1ft. (11) . , λ ) be a J × 1 vector of log-intensities. We model (x x )1/2 · σ Jk,t ′ t t the log intensities by The updating equation (11) can be extended by λ = d + Zf + X β, including σ2 as a time-varying factor and by adjusting the k,t t k,t (12) scaled score function (10) for the time-varying parameter 2 vector ft = (βt′ , σt )′. where d is a J × 1 vector of baseline intensities, Z is a J × r matrix of factor loadings, and β is a p × 1 vector of regression parameters for the exogenous covariates Illustration: dynamic pooled marked point Xk,t.