Econometrics

Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas

To capture the dynamic behavior of univariate and multivariate processes, we can allow parameters to be time-varying by having them as functions of lagged dependent variables as well as exogenous variables. Although other approaches of introducing time dependence exists, the GAS models, Generalized Autoregressive Score, particular approach have become popular in applied and econometrics. Here we discuss a further development of Creal, Koopman, and Lucas (2012) which is based on the score function of the predictive model density at time t.

Typical examples are the generalized autoregressive that time-varying parameters in a multi-state model for conditional heteroskedasticity (GARCH) models of pooled marked point-processes can be introduced natu- Engle (1982) and Bollerslev (1986), the autoregressive rally in our framework. conditional duration and intensity (ACD and ACI, respectively) models of Engle and Russell (1998) and the dynamic copula models of Patton (2006). Creal, The GAS model

Koopman, and Lucas (2012) argue that the score Let N × 1 vector yt denote the dependent variable of function is an effective choice for introducing a driving interest, ft the time-varying parameter vector, xt a vector mechanism for time-varying parameters. In particular, of exogenous variables (covariates), all at time t, and θ t by scaling the score function appropriately, standard a vector of static parameters. Define Y = {y1, . . . , yt}, t t dynamic models such as the GARCH, ACD, and ACI F = {f0, f1, . . . , ft}, and X = {x1, . . . , xt}. The available models can be recovered. Application of this framework information set at time t consists of ft, t where (1) to other non-linear, non-Gaussian, possibly multivariate, { F } models will lead to the formulation of new time-varying t 1 t 1 t = Y − ,F− l, X , for t =1,...,n. (1) parameter models. Ft { } They have labeled their model as the generalized auto- regressive score (GAS) model. Here we aim to introduce We assume that yt is generated by the observation density the GAS model and to illustrate the approach for a class y p(y f , ; θ). (1)(1) of multivariate point-process models that is used empiri- t ∼ t | t Ft cally for the modeling credit risk. We further aim to show Furthermore, we assume that the mechanism for updating

Siem Jan Koopman the time-varying parameter ft is given by the familiar autoregressive updating equation Prof. Dr. Siem Jan Koopman is Professor of Econometrics at the Vrije Universiteit p q Amsterdam and research fellow at the Tinbergen ft+1 = ω + Aist i+1 + Bjft j+1, (2) − − (2) Institute since 1999. His Ph.D. is from the London i=1 j=1 School of Economics (LSE) and dates back to 1992. He had positions at the LSE between 1992 where ω is a vector of constants, coefficient matrices(1) A and 1997 and at the CentER (Tilburg University) i and B have appropriate dimensions for i = 1, . . . , p and between 1997 and 1999. His research interests j are Statistical analysis of time series; Time series j = 1, . . . , q, while st is an appropriate function of past s = s (y ,f , ; θ) econometrics; Financial econometrics; Kalman data, t t t t F t . The unknown coefficients in filter; Simulation-based estimation; Forecasting. (2) are functions of θ, that is ω = ω(θ), Ai = Ai(θ), and Bj (1) (1)

= Bj(θ) for i = 1, . . . , p and j = 1, . . . , q.

AENORM vol. 20 (75) May 2012 37

1

1

1

1

1

1 1 Econometrics Econometrics

The approach is based on the observation density (1) where t = ln p(yt ft, t; θ) for a realization of y . Eva- | F t for a given parameter ft. When observation yt is realized, luating the log-likelihood function of the GAS model is

time-varying ft to the next period t + 1 is updated using particularly simple. It only requires the implementation (2) with of the GAS updating equation (2) and the evaluation of * ∂ ln p(yt ft , t ; θ) t for a particular value θ of θ. st = St t, t = | F , ·∇ ∇ ∂ft (3) Example : GARCH models Consider the basic model

yt = σtεt where the Gaussian disturbance εt has zero mean St = S(t, ft , t ; θ), F and unit variance while σt is a time-varying standard de- where S(·) is a matrix function. Given the dependence viation. It is a basic exercise to show that the GAS (1, 1) 1 2 of the driving mechanism in (2) on the scaled score model with St = − and f = σ reduces to It t 1 t t vector (3), the equations (1) – (3) define the generalized | − f = ω + A y2 f + B f , (7) autoregressive score model with orders p and q. We t+1 1 t − t 1 t refer to the model as GAS (p, q) and we typically take  p = q = 1. which is equivalent to the standard GARCH(1, 1) model

The use of the score for updating ft is intuitive. It as given by defines a steepest ascent direction for improving the f = α + α y2 + β f ,f= σ2, (8) model’s local fit in terms of the likelihood or density t+1 0 1 t 1 t t t at time t given the current position of the parameter

ft. This provides the natural direction for updating where coefficients α0 = ω, α1 = A1 and β1 = B1 − A1 are (1)

the parameter. In addition, the score depends on the unknown. When we assume that εt follows a Student’s t complete density, and not only on the first or second distribution with ν degrees of freedom and unit variance,

order moments of the observations yt. Via its choice the GAS (1, 1) specification for the conditional variance

of the scaling matrix St, the GAS model allows for leads to the updating equation additional flexibility in how the score is used for 1 updating f . In many situations, it is natural to consider ft+1 = ω + A1 1+3ν− t · · a form of scaling that depends on the variance of the (1 + ν 1) − 2  (9) score. For example, we can define the scaling matrix 1 1 2 1 yt ft (1 2ν− )(1 + ν− y /(1 2ν− ) ft) − as  − t −  (4) +B1ft. 1 St = t−t 1, t t 1 =Et 1 [ t t ] , I | − I | − − ∇ ∇ This model is clearly different compatered to the standard

where Et−1 is expectation with respect to the density t-GARCH(1, 1) model which has the Student’s t density p(y f , ; θ). For this choice of S , the GAS model in (1) with the updating equation (7). The denominator t| t Ft t encompasses well-known models such as GARCH, ACD of the second term in the right-hand side of (9) causes a and ACI. Another possibility for scaling is more moderate increase in the variance for a large (5) realization of |y | as long as ν is finite. The intuition is 1 t St = t t 1, t t 1 t t 1 = t−t 1, clear: if the errors are modeled by a fattailed distribution, J | − J | − J | − I | − a large absolute realization of yt does not necessitate a

where St is defined as the square root matrix of the substantial increase in the variance. Multivariate exten- (pseudo)-inverse information matrix for (1) with sions of this approach are developed in Creal, Koopman,

respect to ft. An advantage of this specific choice for and Lucas (2011).

St is that the statistical properties of the corresponding GAS model become more tractable. In particular, the Example : Regression model The time-varying linear

driver st becomes a martingale difference with unity regression model yt = xt′βt + εt has a k × 1 vector xt of variance. exogenous variables, a k × 1 vector of time-varying re-

A convenient property of the GAS model is the gression coefficients βt and normally independently dis- 2 relatively simple way of estimating parameters by tributed disturbances εt ~ N(0, σ ). Let ft = βt. The scaled maximum likelihood (ML). This feature applies to all score function based on St = t t 1 in for this regres- J | − special cases of GAS models. For an observed time sion model is given by

series y1, . . . , yn and by adopting the standard prediction 1/2 s =(x x )− x (y x f )/σ, (10) error decomposition, we can express the maximization t t t t t − t t problem as where the inverse t t 1 of used to construct t t 1 is I | − J | − n the Moore-Penrose pseudo inverse to account for the ˆ θ = arg max t, (6) singularity of x x ′. The GAS (1, 1) specification for the θ t t t=1 time-varying regression coefficient becomes

38 AENORM vol. 20 (75) May 2012

1

Econometrics Econometrics

denote the last event time before time t and let λk,t = (λ1k,t, xt (yt xt ft) ft+1 = ω + A1 − + B1ft. (11) . . . , λ ) be a J × 1 vector of log-intensities. We model (x x )1/2 · σ Jk,t ′ t t the log intensities by The updating equation (11) can be extended by λ = d + Zf + X β, including σ2 as a time-varying factor and by adjusting the k,t t k,t (12) scaled score function (10) for the time-varying parameter 2 vector ft = (βt′ , σt )′. where d is a J × 1 vector of baseline intensities, Z is a J × r matrix of factor loadings, and β is a p × 1 vector of regression parameters for the exogenous covariates

Illustration: dynamic pooled marked point Xk,t. The r × 1 vector of dynamic factors ft is specified process by the GAS (1, 1) updating equation (2) with ω = 0. (1)

Statistical models with time-varying intensities have Since ft is not observed directly, we need to impose a sign received much attention in finance and econometrics. restriction on Z to obtain economic interpretations for The principal areas of application in economics include the time-varying parameters. We assume the model has intraday trade data (market microstructure), defaults of a factor structure: intensities of all firms are driven by firms, credit rating transitions and (un)employment spells the same vector of time-varying systematic parameters

over time. To illustrate the GAS model in this setting, we ft. The log-likelihood specification using (12) is given by consider an application from the credit risk literature in J N which pooled marked point-processes play an important = y λ R t jk,t jk,t − jk,t· (13) role. We empirically analyze credit risk and rating j=1 k=1 transitions within the GAS framework for Moody’s data. (t t∗) exp (λjk,t∗ ) , Let yk,t = (y1k,t, . . . , yJk,t)′ be a vector of marks of J − ·

competing risk processes for firmsk = 1, . . . ,N. We have where Rk,t = (R1k,t, . . . ,RJk,t)′ and Rjk,t is a zero-one variable

yjk,t = 1 if event type j out of J materializes for firm k indicating whether company k is potentially subject to at time t, and zero otherwise, and we assume that the risk j at time t. Define P as a J × J diagonal matrix with

pooled is orderly, such that with probability jth diagonal element pj,t = k Rjk,t · exp[λjk,t] / j,k Rjk,t · 1 precisely one event occurs at each event time. Let t* exp[λ ] = P[ y = 1 | y = 1], i.e., the probability jk,t k jk,t j,k jk,t

Figure 1: The estimated intensities (in basis points) for each transition type for the one-factor marked point process model. Moody’s rating histories are for all US corporates between January 1981 and March 2010.

1

40 AENORM vol. 20 (75) May 2012 Econometrics

that the next event is of type j given that an event happens Figure 1 compares the estimates of ft obtained from for firmk . Based on the first and second derivative of t the two model specifications. For each of the four and setting St = t t 1 , we obtain the score and scaling possible rating transitions, we plot the intensity of J | − matrix the transition (in basis points on a log scale). These intensities, after dividing them by the number of days N in a year, can approximately be interpreted as the daily = Z y R (t t∗) exp(λ ) , ∇t k,t − k,t · − · k,t∗ transition probabilities for each rating transition type. We k=1  learn from Figure 1 that the estimates of the time-varying  1 2 St =(Z PZ)− . probabilities of the GAS model are almost identical to those of the KLM08 model. However, in our current By combining these basic elements into a GAS GAS framework, the results can be obtained without the specification, we have obtained a new timevarying need of computationally intensive simulation methods parameter model for credit rating transitions. In required for models such as KLM08 . It underlines an comparison with related models, parameter estimation attractive feature of our GAS approach. for the current model is much easier. References

Application to Moody’s credit rating data T. Bollerslev: “Generalized autoregressive conditional For our illustration, we adopt the marked point-process heteroskedasticity”, Journal of Econometrics 31 (3) model (12), (13) and (2) with ω = 0 and s = S given (1986)(1) 307–327 t t t by (14) for a data set which contains Moody’s rating histories of all US corporates over the period January D. D. Creal, S. J. Koopman, and A. Lucas: “A dynamic 1981 to March 2010. The initial credit ratings for each multivariate heavy-tailed model for time-varying firm are known at the beginning of the sample and we volatilities and correlations”, Journal of Business & observe the transitions from one rating category to Economic Statistics 29 (2011) 552–563 another over time. Moody’s ratings include 21 different categories, some of which are sparsely populated. For the D. D. Creal, S. J. Koopman, and A. Lucas: “Generalized sake of this illustration, therefore, we pool the ratings into Autoregressive Score Models with Applications”, a much smaller set of 2 credit classes: investment grade Journal of Applied Econometrics, forthcoming (2012) (IG) and sub-investment grade (SIG). Default is treated as an absorbing category: it makes for J = 4 possible R. F. Engle: “Autoregressive conditional hetero- events. It is often concluded in credit risk studies that scedasticity with estimates of the variance of United default probabilities are countercyclical. We therefore Kingdom inflation”, Econometrica 50 (4) (1982) 987– allow the log-intensities (12) to depend upon the annual 1007 growth rate (standardized) of US industrial production as an exogenous variable. We only present the results for a R. F. Engle, and J. R. Russell: “Autoregressive conditional single factor, r = 1. In order to identify the parameters in duration: a new model for irregularly spaced transaction the 1 × 4 vector Z, we set its last element to unity so that data”, Econometrica 66 (5) (1998) 1127–1162 our single factor is common to all transition types but is identified as the event representing a move from SIG to S. J. Koopman, A. Lucas, and A. Monteiro: “The multi- default. state latent factor intensity model for credit rating For the one-factor model (r = 1), we perform a full transitions”, Journal of Econometrics 142 (1) (2008) benchmark analysis for the new GAS model in relation to 399–424 our benchmark model of Koopman et al. (2008), hereafter referred to as KLM08 . The marked point process KLM08 A. J. Patton: “Modelling asymmetric exchange rate model has the same observation density (13) as the GAS dependence”, International Economic Review 47 (2) model. However, the time-varying parameter ft follows (2006) 527–556 an Ornstein-Uhlenbeck process driven by an independent . Parameter estimation for the KLM08 model is more involved than for the GAS model due to the presence of a dynamic, non-predictable stochastic component.

AENORM vol. 20 (75) May 2012 41

1