Time Series Models for Event Counts, I

Agenda Event Count Models Event Count Time Series Time Series Models for Event Counts, I Patrick T. Brandt University of Texas, Dallas July 2010 Patrick T. Brandt Time Series Models for Event Counts, I Agenda Event Count Models Event Count Time Series Agenda Introduction to basic event count time series Examples of why we need separate models for these kind of data PEWMA and PAR(p) introduction Fitting and interpreting PEWMA and PAR(p) models using PESTS: dynamic inferences Changepoint models for count data Some recent extensions and new models Patrick T. Brandt Time Series Models for Event Counts, I Agenda Event Count Models Event Count Time Series Preface / Getting Started Get R from your favorite CRAN mirror. The mirror list is at: http://cran.r-project.org/mirrors.html Get the R source code for PESTS from http://www.utdallas.edu/~pbrandt/code/pests.r These slides, data, and R code for examples are at http://www.utdallas.edu/~pbrandt/code/count-examples Put the pests.r and the data files you are going to use in the same folder. Patrick T. Brandt Time Series Models for Event Counts, I Agenda Event Count Models Event Count Time Series 1 Event Count Models Data Examples Poisson Models Negative Binomial Models 2 Event Count Time Series Existing approaches Models for time series of counts PEWMA PAR(p) Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Example: Mayhew's Legislation Data Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Example: Militarized Interstate Disputes (MIDS) Series 1 ACF MIDS 20 40 60 80 100 120 140 0.2 0.0 0.2 0.4 0.6 0.8 1.0 ï 1850 1900 1950 0 5 10 15 20 Time Lag Patrick T. Brandt Time Series Models for Event Counts, I Chinese and Taiwanese MIDS Event Count Time Series Event Count Models Patrick T. Brandt Agenda China and Taiwan threats and actions, MIDS: 1950−2001 4 3 Negative Binomial Models Poisson Models Data Examples Time Series Models for Event Counts, I 2 1 China threats 0 5 4 3 2 1 China actions 0 3.0 2.0 1.0 Taiwan threats Taiwan 0.0 6 5 4 3 2 1 Taiwan actions Taiwan 0 1950 1960 1970 1980 1990 2000 Time Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Count data models Count data are the result of a process of measuring the number of discrete events over some period of time. Typically, these models assume that the process that generates the events is independent of time (t). This means that they are memoryless. The times between events are assumed to be independent and exponentially distributed. This is a very strong set of restrictions: most event data violate them in one or more ways. Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Event Count Models Typically, one of several models are used to fit a regression model to count data: Poisson regression Negative binomial regression Generalized event count Generalized estimating equations Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Poisson Models Standard approach to modeling event count data is to use a Poisson distribution: yt −µt µt e Pr (yt jµt ) = : yt ! Estimation of the mean parameter is accomplished via maximum likelihood methods. Note that for the Poisson model, E (yt ) = V (yt ) = µ. A Poisson regression model can be created by the parameterization µt = exp (Xt δ) : Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Negative binomial models This model allows for V (y) > E(y) which is common in count data. This is known as overdispersion. The negative binomial distribution is given by νt yt Γ(yt + νt ) νt µt Pr(yt jµt ; νt ) = yt !Γ(νt ) νt + µt νt + µt The νt parameter captures the level of overdispersion, or how much larger the variance is than the mean. We can make this a regression model by defining µt = exp (Xt δ). Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models How the negative binomial captures overdispersion The conditional mean for the negative binomial (NB) regression model is E[yt jXt ] = µt = exp (Xt δ) : The conditional variance is µt exp (Xt δ) V [yt jXt ] = µt 1 + = exp (Xt δ) 1 + : νt νt This variance will be unidentified since the term νt has a t index. Patrick T. Brandt Time Series Models for Event Counts, I Agenda Data Examples Event Count Models Poisson Models Event Count Time Series Negative Binomial Models Identifying the negative binomial The common assumption is that the variance parameter νt is the same across all of the observations (this same assumption is used in the subsequent time series models). −1 If we assume that νt = α and α > 0 then µ V [y jX ] = µ 1 + t t t t α−1 2 = µt + αµt (So now you know how to interpret correctly that α parameter reported from Stata for nbreg.) Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Standard Approaches Fail Poisson regression models are limited because they assume events are independent Alternative models assume a particular dependence: negative binomial and generalized event count (GEC) OLS / ARIMA models use the wrong (Gaussian) distribution Including a lagged endogenous count implies a growth rate model These are the motivating arguments for Brandt, Williams, Fordham and Pollins (2000: American Journal of Political Science) and Brandt and Williams (2001: Political Analysis). Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Why does this matter? Little is known about the efficiency properties of event count models in the presence of dynamic mis-specification. If count data demonstrate serial dependence, how can we model this dependence? Further, if we fail to model this dependence, how biased / inconsistent / inefficient are the estimates we get? Can't we just fix all of this with a lagged dependent variable like we do in most other models? Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Limits of a lagged count model Exponentiated coefficient on the lagged variable is no longer an autocorrelation coefficient. It is a growth rate. Model is only appropriate for non-stationary or trending event counts, since the mean is an exponential function of time. Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Just say NO to the lagged count model! Let zt ∼ Po (µt ), µt = exp (Xt δ + ρzt−1) and Xt are i.i.d. The growth rate of this lagged Poisson regression model is the difference the logged mean counts: ln (µt ) − ln (µt−1) = Xt δ − Xt−1δ + ρzt−1 − ρzt−2: Taking expectations gives E [ln (µt ) − ln (µt−1)] = ρE [zt−1 − zt−2] : Unless ρ = 0 or E [zt−1 − zt−2] = 0; this model implies a non-zero growth rate for the conditional mean. The coefficient ρ is a growth rate rather than an autocorrelation or discounting coefficient. Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Desiderata of a count time series model Need models that can deal with trends in counts (PEWMA) Need models that can deal with cycles in counts (PAR(p)) Diagnostics for model selection When are Gaussian-based models OK? Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Models for count time series There are two frameworks for count time series models: Observation driven models: past counts predict current counts. PEWMA PAR(p) Parameter driven models: parameters change over time. Changepoint models Latent dynamic parameter or factor models This list of examples is by no means exhaustive. Count time series models fit into one of these approaches and there is a fair amount of observational equivalence across these modeling strategies. Patrick T. Brandt Time Series Models for Event Counts, I Existing approaches Agenda Models for time series of counts Event Count Models PEWMA Event Count Time Series PAR(p) Simple Diagnostics for Count Time Series We most often want to know if the count time series are serially correlated. Cameron and Trivedi (1998) show that one can use standard time series diagnostics for serial correlation to determine whether counts should be modeled with a time series. 1 Standardize the count time series: for each observation subtract off the mean and divide by the standard deviation of the series (so just like finding a z-score). 2 Compute the autocorrelation function of the standardized counts.

Time Series Models for Event Counts, I

An Introduction to Poisson Regression Russ Lavery, K&L Consulting Services, King of Prussia, PA, U.S.A

Generalized Linear Models (Glms)

Generalized Linear Models with Poisson Family: Applications in Ecology

Heteroscedastic Errors

Generalized Linear Models

“Multivariate Count Data Generalized Linear Models: Three Approaches Based on the Sarmanov Distribution”

A Comparison of Generalized Linear Models for Insect Count Data

Bayesian Hierarchical Poisson Regression Model for Overdispersed Count Data

Using Geographically Weighted Poisson Regression for County-Level Crash Modeling in California ⇑ Zhibin Li A,B, , Wei Wang A,1, Pan Liu A,2, John M

Shrinkage Improves Estimation of Microbial Associations Under Di↵Erent Normalization Methods 1 2 1,3,4, 5,6,7, Michelle Badri , Zachary D

Generalized Linear Models and Point Count Data: Statistical Considerations for the Design and Analysis of Monitoring Studies

Poisson Regression 1 Poisson Regression