Agenda Event Count Models Event Count Time Series
Time Series Models for Event Counts, I
Patrick T. Brandt
University of Texas, Dallas
July 2010
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Event Count Models Event Count Time Series Agenda
Introduction to basic event count time series Examples of why we need separate models for these kind of data PEWMA and PAR(p) introduction Fitting and interpreting PEWMA and PAR(p) models using PESTS: dynamic inferences Changepoint models for count data Some recent extensions and new models
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Event Count Models Event Count Time Series Preface / Getting Started
Get R from your favorite CRAN mirror. The mirror list is at: http://cran.r-project.org/mirrors.html Get the R source code for PESTS from http://www.utdallas.edu/~pbrandt/code/pests.r These slides, data, and R code for examples are at http://www.utdallas.edu/~pbrandt/code/count-examples Put the pests.r and the data files you are going to use in the same folder.
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Event Count Models Event Count Time Series
1 Event Count Models Data Examples Poisson Models Negative Binomial Models
2 Event Count Time Series Existing approaches Models for time series of counts PEWMA PAR(p)
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Example: Mayhew’s Legislation Data
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Example: Militarized Interstate Disputes (MIDS) Series 1 ACF MIDS 20 40 60 80 100 120 140 0.2 0.0 0.2 0.4 0.6 0.8 1.0 ï 1850 1900 1950 0 5 10 15 20
Time Lag
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Chinese and Taiwanese MIDS
China and Taiwan threats and actions, MIDS: 1950−2001 4 3 2 1 China threats 0 5 4 3 2 1 China actions 0 3.0 2.0 1.0 Taiwan threats Taiwan 0.0 6 5 4 3 2 1 Taiwan actions Taiwan 0
1950 1960 1970 1980 1990 2000
Patrick T. Brandt TimeTime Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Count data models
Count data are the result of a process of measuring the number of discrete events over some period of time. Typically, these models assume that the process that generates the events is independent of time (t). This means that they are memoryless. The times between events are assumed to be independent and exponentially distributed. This is a very strong set of restrictions: most event data violate them in one or more ways.
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Event Count Models
Typically, one of several models are used to fit a regression model to count data: Poisson regression Negative binomial regression Generalized event count Generalized estimating equations
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Poisson Models
Standard approach to modeling event count data is to use a Poisson distribution:
yt −µt µt e Pr (yt |µt ) = . yt !
Estimation of the mean parameter is accomplished via maximum likelihood methods.
Note that for the Poisson model, E (yt ) = V (yt ) = µ. A Poisson regression model can be created by the parameterization µt = exp (Xt δ) .
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Negative binomial models
This model allows for V (y) > E(y) which is common in count data. This is known as overdispersion. The negative binomial distribution is given by
νt yt Γ(yt + νt ) νt µt Pr(yt |µt , νt ) = yt !Γ(νt ) νt + µt νt + µt
The νt parameter captures the level of overdispersion, or how much larger the variance is than the mean. We can make this a regression model by defining µt = exp (Xt δ).
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models How the negative binomial captures overdispersion
The conditional mean for the negative binomial (NB) regression model is
E[yt |Xt ] = µt = exp (Xt δ) .
The conditional variance is µt exp (Xt δ) V [yt |Xt ] = µt 1 + = exp (Xt δ) 1 + . νt νt
This variance will be unidentified since the term νt has a t index.
Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Identifying the negative binomial
The common assumption is that the variance parameter νt is the same across all of the observations (this same assumption is used in the subsequent time series models). −1 If we assume that νt = α and α > 0 then µ V [y |X ] = µ 1 + t t t t α−1 2 = µt + αµt
(So now you know how to interpret correctly that α parameter reported from Stata for nbreg.)
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Standard Approaches Fail
Poisson regression models are limited because they assume events are independent Alternative models assume a particular dependence: negative binomial and generalized event count (GEC) OLS / ARIMA models use the wrong (Gaussian) distribution Including a lagged endogenous count implies a growth rate model These are the motivating arguments for Brandt, Williams, Fordham and Pollins (2000: American Journal of Political Science) and Brandt and Williams (2001: Political Analysis).
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Why does this matter?
Little is known about the efficiency properties of event count models in the presence of dynamic mis-specification. If count data demonstrate serial dependence, how can we model this dependence? Further, if we fail to model this dependence, how biased / inconsistent / inefficient are the estimates we get? Can’t we just fix all of this with a lagged dependent variable like we do in most other models?
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Limits of a lagged count model
Exponentiated coefficient on the lagged variable is no longer an autocorrelation coefficient. It is a growth rate. Model is only appropriate for non-stationary or trending event counts, since the mean is an exponential function of time.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Just say NO to the lagged count model!
Let zt ∼ Po (µt ), µt = exp (Xt δ + ρzt−1) and Xt are i.i.d. The growth rate of this lagged Poisson regression model is the difference the logged mean counts:
ln (µt ) − ln (µt−1) = Xt δ − Xt−1δ + ρzt−1 − ρzt−2.
Taking expectations gives
E [ln (µt ) − ln (µt−1)] = ρE [zt−1 − zt−2] .
Unless ρ = 0 or E [zt−1 − zt−2] = 0, this model implies a non-zero growth rate for the conditional mean. The coefficient ρ is a growth rate rather than an autocorrelation or discounting coefficient.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Desiderata of a count time series model
Need models that can deal with trends in counts (PEWMA) Need models that can deal with cycles in counts (PAR(p)) Diagnostics for model selection When are Gaussian-based models OK?
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Models for count time series
There are two frameworks for count time series models: Observation driven models: past counts predict current counts. PEWMA PAR(p) Parameter driven models: parameters change over time. Changepoint models Latent dynamic parameter or factor models This list of examples is by no means exhaustive. Count time series models fit into one of these approaches and there is a fair amount of observational equivalence across these modeling strategies.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Simple Diagnostics for Count Time Series
We most often want to know if the count time series are serially correlated. Cameron and Trivedi (1998) show that one can use standard time series diagnostics for serial correlation to determine whether counts should be modeled with a time series.
1 Standardize the count time series: for each observation subtract off the mean and divide by the standard deviation of the series (so just like finding a z-score). 2 Compute the autocorrelation function of the standardized counts.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Example: using autocorrelation functions to diagnose count serial correlation
This example is based on the data in StraitsMIDS-example
load("StraitsMIDS.RData")
# Make d into a ts() object d <- as.ts(d)
# Compute the ACFs for the standardized data
acf(apply(d, 2, scale))
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) ACF for China-Taiwan MIDS
China threats Chnt & Chna Chnt & Twnt Chnt & Twna 1.0 1.0 1.0 1.0 0.4 0.4 0.4 0.4 ACF −0.2 −0.2 −0.2 −0.2 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Lag Lag Lag Lag
Chna & Chnt China actions Chna & Twnt Chna & Twna 1.0 1.0 1.0 1.0 0.4 0.4 0.4 0.4 ACF −0.2 −0.2 −0.2 −0.2 −10 −6 −2 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Lag Lag Lag Lag
Twnt & Chnt Twnt & Chna Taiwan threats Twnt & Twna 1.0 1.0 1.0 1.0 0.4 0.4 0.4 0.4 ACF −0.2 −0.2 −0.2 −0.2 −10 −6 −2 −10 −6 −2 0 2 4 6 8 0 2 4 6 8 Lag Lag Lag Lag
Twna & Chnt Twna & Chna Twna & Twnt Taiwan actions 1.0 1.0 1.0 1.0 0.4 0.4 0.4 0.4 ACF −0.2 −0.2 −0.2 −0.2 −10 −6 −2 −10 −6 −2 −10 −6 −2 0 2 4 6 8 Lag Lag Lag Lag
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) ACF interpretation
Plots on the diagonal give the autocorrelation functions. Plots off the diagonal are the cross-correlation functions. First autocorrelation value is always 1 (why?) See evidence of serial correlation in the China actions, Taiwan threats, and Taiwanese action series.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Two Observation Driven Models for Event Count Time Series
PEWMA Poisson Exponentially Weighted Moving Average. Models a moving mean for persistent event count data. This is used for time-varying or random walk count data. PAR(p) Poisson Autoregressive Model of Order p. Models a linear autoregressive, mean reverting event count series.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) PEWMA: A Model for Persistent Counts
Model for persistent count data: the Poisson exponentially weighted moving average (PEWMA) The PEWMA is a structural time series model. The model and method of estimation were originally proposed by Harvey and Fernandes (1989) Easily implemented: our implementation modifies the original to correct the transition equation as proposeed by Shephard (1994).
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) The PEWMA model details
Measurement Equation:
yt −µt µt e Pr (yt |µt ) = yt ! Transition Equation:
∗ µt = µt−1 exp (Xt δ + rt ) ηt , where ηt ∼ Beta (ωat−1, (1 − ω) at−1)
Conjugate Prior:
∗ µt−1 ∼ Γ(at−1, bt−1) .
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) PEWMA parameters
The hyperparameter ω ∈ (0, 1], measures the dependence. ∗ µt−1 is the period t − 1 conditional mean. Small values indicate more dynamics and dependence in the data, while values near one indicate independence (i.e., Poisson model). Values of the regressor coefficients, δ, can be interpreted as in a standard Poisson model. PEWMA nests Poisson: can use standard ML tests to evaluate dependence v. independence. Transition equation differs from Harvey and Fernandes. It is based on the gamma distributed transition results of Shephard (1994). It allows for a separate growth rate (rt ) in each period. The mean growth rate is zero.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) PEWMA Forecast Function
Based on repeated substitutions of a and b the forecast function for the one-step ahead prediction is:
PT −1 j exp (XT +1δ + rT +1) j=0 ω yT −j y¯T +1|T = . PT −1 j 0 j=0 ω exp XT −j δ + rT −j
This is an exponentially weighted moving average.
When T is large,y ¯T +1|T approaches
µT = ωy¯T |T −1 + (1 − ω) yT for T = T + 1, ..., T + h.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) PEWMA Interpretation
Latent variable is a random walk Latent mean is a random walk. Coefficients of regressors Can use standard methods for Poisson regression. No impact multipliers because this is an EWMA. Nests the Poisson Test whether ω = 1 to see if you can just use a Poisson regression.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) PAR(p) model
An event count series can also be modeled using a “linear autoregressive process.” This process can be used to define transition equation for a state space or non-linear filter model. In this specification, the counts today will depend on the p past values via an autoregression.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Why develop the PAR(p) model?
Alternative specification for stationary count data series. Ease of interpreting predictive distribution is based on a linear function. Generalization: The AR model can be account for higher order, finite lag structures. Diagnostics: ACF and PACF routines can be used for diagnostics.
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) PAR(p) model
Measurement Equation: yt −mt mt e Pr (yt |mt ) = , yt ! Transition Equation:
p p ! X X mt = ρi yt−i + 1 − ρi µt i=1 i=1 µt = exp (Xt δ)
Conjugate Prior:
Pr (mt |Yt−1) = Γ (σt−1mt−1, σt−1)
mt−1 = E [yt |Yt−1]
σt−1 = Var [yt |Yt−1] mt−1 > 0, σt−1 > 0
Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) PAR(p) likelihood
The forecast density for the one-step ahead distribution is Z Pr (yt |Yt−1) = Pr (yt |θt ) · Pr (θt |Yt−1)dθ θ Measurement Transition Γ σ m + y t|t−1 t|t−1 t σt|t−1mt|t−1 = σt|t−1 Γ(yt + 1) Γ σt|t−1mt|t−1