Mo deling Time Series Count Data: An Autoregressive
Conditional Poisson Mo del
Andr eas Heinen
Department of Economics
University of California, San Diego
9500 Gilman Drive
La Jolla CA 92037
November 2000
Abstract
This pap er intro duces new mo dels for time series count data. The Autoregressive
Conditional Poisson mo del ACP makes it p ossible to deal with issues of discrete-
ness, overdisp ersion variance greater than the mean and serial correlation. A fully
parametric approachis taken and a marginal distribution for the counts is sp eci ed,
where conditional on past observations the mean is autoregressive. This enables to
attain improved inference on co ecients of exogenous regressors relative to static Pois-
son regression, which is the main concern of the existing literature, while mo deling the
serial correlation in a exible way. Avariety of mo dels, based on the double Poisson
distribution of Efron 1986 is intro duced, which in a rst step intro duce an additional
disp ersion parameter and in a second step make this disp ersion parameter time-varying.
All mo dels are estimated using maximum likeliho o d which makes the usual tests avail-
able. In this framework auto correlation can b e tested with a straightforward likeliho o d
ratio test, whose simplicity is in sharp contrast with test pro cedures in the latentvari-
able time series count mo del of Zeger 1988. The mo dels are applied to the time series
of monthly p olio cases in the U.S b etween 1970 and 1983 as well as to the daily number
of price change durations of :75$ on the IBM sto ck. A :75$ price-change duration is
de ned as the time it takes the sto ck price to moveby at least :75$. The variable of
interest is the daily numb er of such durations, which is a measure of intradaily volatil-
ity, since the more volatile the sto ck price is within a day, the larger the counts will b e.
The ACP mo dels provide go o d density forecasts of this measure of volatility. 1
1 Intro duction
Many interesting empirical questions can b e addressed by mo deling a time series of count
data. Examples can be found in a great variety of contexts. In the area of accident
prevention, Johansson 1996 used time series counts to assess the e ect of lowered sp eed
limits on the numb er of road casualties. In epidemiology, time series counts arise naturally
in the study of the incidence of a disease. A prominent example is the time series of
monthly cases of p olio in the U.S. which has b een studied extensively by Zeger 1988 and
Brannas and Johansson 1994. In the area of nance, b esides the applications mentioned
in Cameron and Trivedi 1996, counts arise in market microstructure as so on as one starts
lo oking at tick-by-tick data. The price pro cess for a sto ck can b e viewed as a sum of discrete
price changes. The daily numb er of these price changes constitutes a time series of counts
whose prop erties are of interest.
Most of these applications involve relatively rare events which makes the use of the
normal distribution questionable. Thus, mo deling this typ e of series requires one to deal
explicitly with the discreteness of the data as well as its time series prop erties. Neglect-
ing either of these two characteristics would lead to p otentially serious missp eci cation.
A typical issue with time series data is auto correlation and a common feature of count
data is overdisp ersion the variance is larger than the mean. Both of these problems are
addressed simultaneously by using an autoregressive conditional Poisson mo del ACP. In
the simplest mo del counts haveaPoisson distribution and their mean, conditional on past
observations, is autoregressive. Whereas, conditional on past observations, the mo del is
equidisp ersed the variance is equal to the mean, it is unconditionally overdisp ersed. A
fully parametric approach and cho ose to mo del the conditional distribution explicitly and
make sp eci c assumptions ab out the nature of the auto correlation in the series. A simi-
lar mo deling strategy has b een explored indep endently by Rydb erg and Shephard 1998,
Rydb erg and Shephard 1999b, and Rydb erg and Shephard 1999a. Two generalisations
of their framework are intro duced in this pap er. The rst consists of replacing the Pois-
son by the double Poisson distribution of Efron 1986, which allows for either under- or
overdisp ersion in the marginal distribution. In this context, two common variance functions
will b e explored. Finally, an extended version of the mo del is prop osed, which allows for
separate mo dels of mean and of variance. It is shown that this mo del p erforms well in a
nancial application and that it delivers very good density forecasts, which are tested by
using the techniques prop osed by Dieb old and Tay 1997.
The main advantages of this mo del are that it is exible, parsimomious and easy to
estimate using maximum likeliho o d. Results are easy to interpret and standard hyp othesis
test are available. In addition, given that the auto correlation and the density are mo deled
explicitly, the mo del is well suited for b oth p oint and density forecasts, which can be
of interest in many applications. Finally, due to its similarity with the autoregressive
conditional heteroskedasticity ARCH mo del of Engle 1982, the present framework can
be extended to most of the mo dels in the ARCH class; for a review see Bollerslev, Engle,
and Nelson 1994.
The pap er is organised as follows. A great numb er of mo dels of time series count data
have b een prop osed, which will b e reviewed in section 2. In section 3 the basic autoregressive
Poisson mo del is intro duced and some of its prop erties are discussed. Section 4 intro duces 2
twoversions of the Double Autoregressive Conditional Poisson DACP mo del, along with
their prop erties. In section 5 the mo del is generalised to allow for time-varying variance,
applications to the daily numb er of price changes on IBM and to the numb er of new p olio
cases are presented is section 6; section 7 concludes.
2 A review of mo dels for count data in time series
Many di erent approaches have b een prop osed to mo del time series count data. Good
reviews can be found b oth in Cameron and Trivedi 1979, Chapter 7 and in MacDonald
and Zucchini 1997, Chapter 1. Markov chains are one way of dealing with count data
in time series. The metho d consists of de ning transition probabilities between all the
p ossible values that the dep endent variable can take and determining, in the same way
as in usual time series analysis, the appropriate order for the series. This metho d is only
reasonable, though, when there are very few p ossible values that the observations can take.
A prominent area of application for Markovchains is binary data. As so on as the number
of values that the dep endentvariable takes gets to o large, these mo dels lose tractability.
Discrete Autoregressive Moving Average DARMA mo dels are mo dels for time series
count data with prop erties similar to those of ARMA pro cesses found in traditional time
series analysis. They are probabilistic mixtures of discrete i.i.d. random variables with
suitably chosen marginal distribution. One of the problems asso ciated with these mo d-
els seems to be the diculty of estimating them. An application to the study of daily
precipitation can be found in Chang and Delleur 1984. The daily level of precipitation
is transformed into a discrete variable based on its magnitude. The metho d of moments
is used to estimate the parameters of the mo del by tting the theoretical auto correlation
function of each mo del to their sample counterpart. Estimation of the mo del seems quite
cumb ersome and the mo del is only applied to a time series which can take at most three
values.
McKenzie 1985 surveys various mo dels based on "binomial thinning". In those mo d-
els, the dep endentvariable y is assumed to b e equal to the sum of an error term with some
t
presp eci ed distribution and the result of y draws from a Bernoulli which takes value
t 1
1 with some probability and 0 otherwise. This guarantees that the dep endent variable
takes only integer values. The parameter in that mo del is analogous to the co ecient
on the lagged value in an AR1 mo del. This mo del, called INAR1, has the same au-
to correlations as the AR1 mo del of traditional time series analysis, which makes it its
discrete counterpart. This family of mo dels has b een generalised to include integer valued
ARMA pro cesses a well as to incorp orate exogenous regressors. The problem with this typ e
of mo dels is the diculty in estimating them. Many mo dels have b een prop osed and the
emphasis was put more on their sto chastic prop erties than on how to estimate them.
Hidden Markovchains, advo cated by MacDonald and Zucchini 1997 are an extension
of the basic Markov chains mo dels, in which various regimes characterising the p ossible
values of the mean are identi ed. It is then assumed that the transition from one to another
of these regimes is governed by a Markov chain. One of the problems of this approach is
that it there is no accepted way of determining the appropriate order for the Markovchain.
Whereas in some cases there is a natural interpretation for what might constitute a suitable
regime, in most applications, and in particular in the applications considered in this pap er, 3
this is not the case. Another problem is that the numb er of parameters to b e estimated can
get big, esp ecially when the number of regimes is large. Finally, the results are, in most
cases, not very easy to interpret.
Harvey and Fernandes 1989 use state-space mo dels with conjugate prior distributions.
Counts are mo deled as a Poisson distribution whose mean itself is drawn from a gamma
distribution. The Gamma distribution dep ends on two parameters a and b which are
treated as latent variables and whose law of motion is a = !a and b = !b .
t 1 t 1
tjt 1 tjt 1
As a result, the mean of the Poisson distribution is taken from a gamma with constant
mean but increasing variance. Estimation is done by maximum likeliho o d and the Kalman
lter is used to up date the latentvariables.
In the static case overdisp ersion is usually viewed as the result of unobserved hetero-
geneity. One way of dealing with this problem is to keep an equidisp ersed distribution,
but intro duce new regressors in the mean equation which are b elieved to capture this het-
erogeneity. Zeger and Qaqish 1988 apply ths intuition to the time series case and add
lagged values of the dep endentvariable to the set of regressors in a static Poisson regression.
They adopt a generalised linear mo del formulation in which conditional mean and variance
are mo deled instead of marginal moments. For count data, the authors prop ose using the
Poisson distribution with the log , its asso ciated link function. The mean is set equal to a
linear combination of exogenous regressors and the time series dep endence is accounted for
by a weighted sum of past deviations of the dep endentvariable from the linear predictor:
h i
P
0 0
q
log = x + log y x where y = maxy ;c. These mo dels have
t t i i
t t
t i t i
i=1
not b een used very much in applications. One of the weaknesses of this sp eci cation is that
had ho c assumptions are needed to handle zeros.
Zeger 1988 extends the generalised linear mo dels and intro duces a latentmultiplicative
2
autoregressive term with unit exp ectation , variance and auto correlation which
t
is resp onsible for intro ducing b oth autoregression and overdisp ersion into the mo del. The
dep endentvariable is is assumed to b e a function of exogenous regressors x with conditional
t
mean = expx , where is the co ecient of interest. Conditionally on the latent
t t
variable, the mo del is equidisp ersed E [y j ]=V [y j ]= , but the marginal variance
t t t t t t
2 2
. In dep ends b oth on the marginal mean and its square: E [y ]= and V [y j ]= +
t t t t t t
t
order to estimate this mo del with maximum likeliho o d, one would need to sp ecify a density
for y j and for . In most cases, no closed-form solution would be available. Instead, a
t t t
quasilikeliho o d metho d is adopted which only requires knowledge of mean, variance and
covariances of y . Given estimates of the parameters of the latent variable, obtained by
t
the metho d of moments, the variance-covariance matrix V of the dep endent variable is
2
formed: V = A + AR A, where A = diag and R is the auto correlation matrix of the
i
j;k
= jj k j. The quasilikeliho o d approach then consists in using the latent variable R
inverse of the variance matrix V as a weight in the rst order conditions. Since inversion
of V is quite cumb ersome when the time series is long, an approximation is prop osed.
This metho d can be viewed as a count data analog of the Co chrane-Orcutt metho d for
normally distributed time series in which all the serial correlation is assumed to come from
the error term. The metho d has b een applied to sudden infant death syndrom by Campb ell
1994. While conceptually this metho d is quite close to what is prop osed in this pap er in
this pap er, there are nonetheless imp ortant di erences in that it is fundamentally a static 4
mo del, whereas the mo del in this pap er is an explicitly dynamic one. The interest is not
limited to getting correct inference ab out the parameters on the exogenous variables but
also lies in adequately capturing the dynamics of the system. In order to achieve this, a
more parametric approach is taken, which, among other things, allows forecasting.
3 The ACP Mo del
The rst mo del prop osed in this pap er has counts follow a Poisson distribution with an
autoregressive mean. The Poisson distribution is the natural starting p oint for counts. One
characteristic of the Poisson distribution is that the mean is equal to the variance. This
prop erty is referred to as equidisp ersion. Most count data however exhibit overdisp ersion.
Mo deling the mean as an autoregressive pro cess generates overdisp ersion in even the simple
Poisson case.
In the simplest mo del, the counts are generated byaPoisson distribution
N jF = P ; 3.1
t t 1 t
with an autoregressive conditional intensity as in the ACD mo del of Engle and Russell
1998 or the conditional variance in the GARCH Generalised Autoregressive Conditional
Heteroskedasticity mo del of Bollerslev 1986:
p q
X X
E [N jF ]= = ! + N + : 3.2
t t 1 t j t j j t j
j =1 j =1
The following prop erties of the unconditional moments of the ACP can b e estabished.
Prop osition 3.1 Unconditional mean of the ACDPp,q or the ACPp,q. The
unconditional mean of the DACPp,q or the ACPp,q is
!
E [N ]= = :
t
P
maxp;q
1 +
j j
j =0
This prop osition shows that, as long as the sum of the autoregressive co ecients is less
than 1, the mo del is stationary and the expression for its mean is identical to the mean of
an ARMA pro cess.
Prop osition 3.2 Unconditional variance of the ACP1,1 Mo del. The unconditio-
nal variance of the ACP1,1 model, when the conditional mean is given by
E [N jF ]= = ! + N + ;
t t 1 t 1 1 t 1 1 t 1
is equal to
2 2
1 + +
1 1
1 2
V [N ]= = :
t
2
1 +
1 1 5
Proof of Proposition 3.2. Pro of in app endix
Prop osition 3.2 shows that the ACP exhibits overdisp ersion, even though it uses an equidis-
p ersed marginal distribution. The mo del is overdisp ersed, as long as 6= 0 and the amount
1
of overdisp ersion is an increasing function of and also, to a lesser extent, of . The
1 1
following prop osition establishes an expression for the auto correlation function of the ACP.
Prop osition 3.3 Auto correlation of the ACP1,1 Mo del. The unconditional au-
tocorrelation of the ACP1,1 model is given by
1 +
1 1 1 1
s 1
Corr[N ;N ]= + :
t t s 1 1
2
2
1 + +
1 1
1
Proof of Proposition 3.3. Pro of in app endix
It can b e shown that the result in Prop osition 3.3 holds also for the GDACP develop ed in
the next section and for b oth the exp onential and the Weibull versions of the ACD mo del
of Engle and Russell 1998.
For the ACP the likeliho o d is
l =N ln lny ! ;
t t t t t
where is a vector containing all the parameters of the autoregressive conditional intensity
and the conditional intensity is autoregressive as in 3.2. The score and Hessian take the
t
very simple form:
@l N @
t t t t
= ;
@ @
t
2 2
N @ @ l
t t t
= ;
2
2 2
@ @
t
where
p
X
@ @
0
t t 1
= z + ; 3.3
t
t
@ @
t=1
and
z =[1;N ;N ;::: ;N ; ; ;::: ; ] : 3.4
t t 1 t 2 t p t 1 t 2 q t q
The score can easily be seen to be equal to zero in exp ectation. It is of interest in this
mo del to test whether there is signi cant auto correlation. In most time series applications
a massive rejection can b e exp ected. This can b e done very simply with a likeliho o d ratio 6
test LR. The statistic will b e equal to twice the di erence b etween the unrestricted and the
2
restricted likeliho o ds, which follows the usual distribution with two degrees of freedom.
Both the restricted and the unrestricted mo dels are easy to estimate. The simplicity of
this test contrast sharply with the diculties asso ciated with tests and estimation of the
auto correlation of a latentvariable in the mo del of Zeger 1988 whichhave b een recently
addressed in Davis and Wang 2000.
4 The DACP Mo del
This section intro duces two mo dels based on the double Poisson distribution, which di er
only by their variance function. The DACP1 has its variance prop ortional to the the mean,
whereas the DACP2 has a quadratic variance function.
In some cases one mightwant to break the link b etween overdisp ersion and serial corre-
lation. It is quite probable that the overdisp ersion in the data is not attributable solely to
the auto correlation, but also to other factors, for instance unobserved heterogeneity. It is
also p ossible that the amount of overdisp ersion in the data is less than the overdipsersion
resulting from the auto correlation, in which case an underdisp ersed marginal distribution
might b e appropriate. In order to account for these p ossibilities the Poisson is replaced by
the double Poisson, intro duced by Efron 1986 in the regression context, which is a natural
extension of the Poisson mo del and allows one to break the equality between conditional
mean and variance. This density is obtained as an exp onential combination with parameter
of the Poisson density of the observation y with mean and of the Poisson with mean
equal to the observation y , which can b e thought of as the likeliho o d function taken at its
maximum value.
The density of the Double Poisson is:
y