Mo deling Count : An Autoregressive

Conditional Poisson Mo del

Andr eas Heinen

Department of Economics

University of California, San Diego

9500 Gilman Drive

La Jolla CA 92037

[email protected]

November 2000

Abstract

This pap er intro duces new mo dels for time series . The Autoregressive

Conditional Poisson mo del ACP makes it p ossible to deal with issues of discrete-

ness, overdisp ersion  greater than the  and serial correlation. A fully

parametric approachis taken and a marginal distribution for the counts is sp eci ed,

where conditional on past observations the mean is autoregressive. This enables to

attain improved inference on co ecients of exogenous regressors relative to static Pois-

son regression, which is the main concern of the existing literature, while mo deling the

serial correlation in a exible way. Avariety of mo dels, based on the double Poisson

distribution of Efron 1986 is intro duced, which in a rst step intro duce an additional

disp ersion and in a second step make this disp ersion parameter time-varying.

All mo dels are estimated using maximum likeliho o d which makes the usual tests avail-

able. In this framework auto correlation can b e tested with a straightforward likeliho o d

ratio test, whose simplicity is in sharp contrast with test pro cedures in the latentvari-

able time series count mo del of Zeger 1988. The mo dels are applied to the time series

of monthly p olio cases in the U.S b etween 1970 and 1983 as well as to the daily number

of price change durations of :75$ on the IBM sto ck. A :75$ price-change duration is

de ned as the time it takes the sto ck price to moveby at least :75$. The variable of

interest is the daily numb er of such durations, which is a measure of intradaily volatil-

ity, since the more volatile the sto ck price is within a day, the larger the counts will b e.

The ACP mo dels provide go o d density forecasts of this measure of volatility. 1

1 Intro duction

Many interesting empirical questions can b e addressed by mo deling a time series of count

data. Examples can be found in a great variety of contexts. In the area of accident

prevention, Johansson 1996 used time series counts to assess the e ect of lowered sp eed

limits on the numb er of road casualties. In , time series counts arise naturally

in the study of the incidence of a disease. A prominent example is the time series of

monthly cases of p olio in the U.S. which has b een studied extensively by Zeger 1988 and

Brannas and Johansson 1994. In the area of nance, b esides the applications mentioned

in Cameron and Trivedi 1996, counts arise in market microstructure as so on as one starts

lo oking at tick-by-tick data. The price pro cess for a sto ck can b e viewed as a sum of discrete

price changes. The daily numb er of these price changes constitutes a time series of counts

whose prop erties are of interest.

Most of these applications involve relatively rare events which makes the use of the

normal distribution questionable. Thus, mo deling this typ e of series requires one to deal

explicitly with the discreteness of the data as well as its time series prop erties. Neglect-

ing either of these two characteristics would lead to p otentially serious missp eci cation.

A typical issue with time series data is auto correlation and a common feature of count

data is overdisp ersion the variance is larger than the mean. Both of these problems are

addressed simultaneously by using an autoregressive conditional Poisson mo del ACP. In

the simplest mo del counts haveaPoisson distribution and their mean, conditional on past

observations, is autoregressive. Whereas, conditional on past observations, the mo del is

equidisp ersed the variance is equal to the mean, it is unconditionally overdisp ersed. A

fully parametric approach and cho ose to mo del the conditional distribution explicitly and

make sp eci c assumptions ab out the nature of the auto correlation in the series. A simi-

lar mo deling strategy has b een explored indep endently by Rydb erg and Shephard 1998,

Rydb erg and Shephard 1999b, and Rydb erg and Shephard 1999a. Two generalisations

of their framework are intro duced in this pap er. The rst consists of replacing the Pois-

son by the double of Efron 1986, which allows for either under- or

overdisp ersion in the marginal distribution. In this context, two common variance functions

will b e explored. Finally, an extended version of the mo del is prop osed, which allows for

separate mo dels of mean and of variance. It is shown that this mo del p erforms well in a

nancial application and that it delivers very good density forecasts, which are tested by

using the techniques prop osed by Dieb old and Tay 1997.

The main advantages of this mo del are that it is exible, parsimomious and easy to

estimate using maximum likeliho o d. Results are easy to interpret and standard hyp othesis

test are available. In addition, given that the auto correlation and the density are mo deled

explicitly, the mo del is well suited for b oth p oint and density forecasts, which can be

of interest in many applications. Finally, due to its similarity with the autoregressive

conditional heteroskedasticity ARCH mo del of Engle 1982, the present framework can

be extended to most of the mo dels in the ARCH class; for a review see Bollerslev, Engle,

and Nelson 1994.

The pap er is organised as follows. A great numb er of mo dels of time series count data

have b een prop osed, which will b e reviewed in section 2. In section 3 the basic autoregressive

Poisson mo del is intro duced and some of its prop erties are discussed. Section 4 intro duces 2

twoversions of the Double Autoregressive Conditional Poisson DACP mo del, along with

their prop erties. In section 5 the mo del is generalised to allow for time-varying variance,

applications to the daily numb er of price changes on IBM and to the numb er of new p olio

cases are presented is section 6; section 7 concludes.

2 A review of mo dels for count data in time series

Many di erent approaches have b een prop osed to mo del time series count data. Good

reviews can be found b oth in Cameron and Trivedi 1979, Chapter 7 and in MacDonald

and Zucchini 1997, Chapter 1. Markov chains are one way of dealing with count data

in time series. The metho d consists of de ning transition probabilities between all the

p ossible values that the dep endent variable can take and determining, in the same way

as in usual time series analysis, the appropriate order for the series. This metho d is only

reasonable, though, when there are very few p ossible values that the observations can take.

A prominent area of application for Markovchains is binary data. As so on as the number

of values that the dep endentvariable takes gets to o large, these mo dels lose tractability.

Discrete Autoregressive DARMA mo dels are mo dels for time series

count data with prop erties similar to those of ARMA pro cesses found in traditional time

series analysis. They are probabilistic mixtures of discrete i.i.d. random variables with

suitably chosen marginal distribution. One of the problems asso ciated with these mo d-

els seems to be the diculty of estimating them. An application to the study of daily

precipitation can be found in Chang and Delleur 1984. The daily level of precipitation

is transformed into a discrete variable based on its magnitude. The metho d of moments

is used to estimate the of the mo del by tting the theoretical auto correlation

function of each mo del to their counterpart. Estimation of the mo del seems quite

cumb ersome and the mo del is only applied to a time series which can take at most three

values.

McKenzie 1985 surveys various mo dels based on "binomial thinning". In those mo d-

els, the dep endentvariable y is assumed to b e equal to the sum of an error term with some

t

presp eci ed distribution and the result of y draws from a Bernoulli which takes value

t1

1 with some probability  and 0 otherwise. This guarantees that the dep endent variable

takes only integer values. The parameter  in that mo del is analogous to the co ecient

on the lagged value in an AR1 mo del. This mo del, called INAR1, has the same au-

to correlations as the AR1 mo del of traditional time series analysis, which makes it its

discrete counterpart. This family of mo dels has b een generalised to include integer valued

ARMA pro cesses a well as to incorp orate exogenous regressors. The problem with this typ e

of mo dels is the diculty in estimating them. Many mo dels have b een prop osed and the

emphasis was put more on their sto chastic prop erties than on how to estimate them.

Hidden Markovchains, advo cated by MacDonald and Zucchini 1997 are an extension

of the basic Markov chains mo dels, in which various regimes characterising the p ossible

values of the mean are identi ed. It is then assumed that the transition from one to another

of these regimes is governed by a . One of the problems of this approach is

that it there is no accepted way of determining the appropriate order for the Markovchain.

Whereas in some cases there is a natural interpretation for what might constitute a suitable

regime, in most applications, and in particular in the applications considered in this pap er, 3

this is not the case. Another problem is that the numb er of parameters to b e estimated can

get big, esp ecially when the number of regimes is large. Finally, the results are, in most

cases, not very easy to interpret.

Harvey and Fernandes 1989 use state-space mo dels with conjugate prior distributions.

Counts are mo deled as a Poisson distribution whose mean itself is drawn from a gamma

distribution. The dep ends on two parameters a and b which are

treated as latent variables and whose law of motion is a = !a and b = !b .

t1 t1

tjt1 tjt1

As a result, the mean of the Poisson distribution is taken from a gamma with constant

mean but increasing variance. Estimation is done by maximum likeliho o d and the Kalman

lter is used to up date the latentvariables.

In the static case overdisp ersion is usually viewed as the result of unobserved hetero-

geneity. One way of dealing with this problem is to keep an equidisp ersed distribution,

but intro duce new regressors in the mean equation which are b elieved to capture this het-

erogeneity. Zeger and Qaqish 1988 apply ths intuition to the time series case and add

lagged values of the dep endentvariable to the set of regressors in a static Poisson regression.

They adopt a generalised linear mo del formulation in which conditional mean and variance

are mo deled instead of marginal moments. For count data, the authors prop ose using the

Poisson distribution with the log , its asso ciated link function. The mean is set equal to a

linear combination of exogenous regressors and the time series dep endence is accounted for

by a weighted sum of past deviations of the dep endentvariable from the linear predictor:

h i

P

0 0

q

 

log   = x + log y  x where y = maxy ;c. These mo dels have

t ti i

t t

ti ti

i=1

not b een used very much in applications. One of the weaknesses of this sp eci cation is that

had ho c assumptions are needed to handle zeros.

Zeger 1988 extends the generalised linear mo dels and intro duces a latentmultiplicative

2

autoregressive term  with unit exp ectation , variance  and auto correlation  which

t

 

is resp onsible for intro ducing b oth autoregression and overdisp ersion into the mo del. The

dep endentvariable is is assumed to b e a function of exogenous regressors x with conditional

t

mean  = expx , where is the co ecient of interest. Conditionally on the latent

t t

variable, the mo del is equidisp ersed E [y j ]=V [y j ]=  , but the marginal variance

t t t t t t

2 2

. In dep ends b oth on the marginal mean and its square: E [y ]=  and V [y j ]= +  

t t t t t t

t

order to estimate this mo del with maximum likeliho o d, one would need to sp ecify a density

for y j and for  . In most cases, no closed-form solution would be available. Instead, a

t t t

quasilikeliho o d metho d is adopted which only requires knowledge of mean, variance and

of y . Given estimates of the parameters of the latent variable, obtained by

t

the metho d of moments, the variance- matrix V of the dep endent variable is

2

formed: V = A +  AR A, where A = diag   and R is the auto correlation matrix of the

 i

j;k

=  jj k j. The quasilikeliho o d approach then consists in using the latent variable R





inverse of the variance matrix V as a weight in the rst order conditions. Since inversion

of V is quite cumb ersome when the time series is long, an approximation is prop osed.

This metho d can be viewed as a count data analog of the Co chrane-Orcutt metho d for

normally distributed time series in which all the serial correlation is assumed to come from

the error term. The metho d has b een applied to sudden infant death syndrom by Campb ell

1994. While conceptually this metho d is quite close to what is prop osed in this pap er in

this pap er, there are nonetheless imp ortant di erences in that it is fundamentally a static 4

mo del, whereas the mo del in this pap er is an explicitly dynamic one. The interest is not

limited to getting correct inference ab out the parameters on the exogenous variables but

also lies in adequately capturing the dynamics of the system. In order to achieve this, a

more parametric approach is taken, which, among other things, allows .

3 The ACP Mo del

The rst mo del prop osed in this pap er has counts follow a Poisson distribution with an

autoregressive mean. The Poisson distribution is the natural starting p oint for counts. One

characteristic of the Poisson distribution is that the mean is equal to the variance. This

prop erty is referred to as equidisp ersion. Most count data however exhibit overdisp ersion.

Mo deling the mean as an autoregressive pro cess generates overdisp ersion in even the simple

Poisson case.

In the simplest mo del, the counts are generated byaPoisson distribution

N jF = P   ; 3.1

t t1 t

with an autoregressive conditional intensity as in the ACD mo del of Engle and Russell

1998 or the conditional variance in the GARCH Generalised Autoregressive Conditional

Heteroskedasticity mo del of Bollerslev 1986:

p q

X X

E [N jF ]= = ! + N +  : 3.2

t t1 t j tj j tj

j =1 j =1

The following prop erties of the unconditional moments of the ACP can b e estabished.

Prop osition 3.1 Unconditional mean of the ACDPp,q or the ACPp,q. The

unconditional mean of the DACPp,q or the ACPp,q is

!

E [N ]= = :

t

P

maxp;q 

1  + 

j j

j =0

This prop osition shows that, as long as the sum of the autoregressive co ecients is less

than 1, the mo del is stationary and the expression for its mean is identical to the mean of

an ARMA pro cess.

Prop osition 3.2 Unconditional variance of the ACP1,1 Mo del. The unconditio-

nal variance of the ACP1,1 model, when the conditional mean is given by

E [N jF ]= = ! + N +  ;

t t1 t 1 1 t1 1 t1

is equal to

2 2

 1  +  +

1 1

1 2

V [N ]= =  :

t

2

1  + 

1 1 5

Proof of Proposition 3.2. Pro of in app endix

Prop osition 3.2 shows that the ACP exhibits overdisp ersion, even though it uses an equidis-

p ersed marginal distribution. The mo del is overdisp ersed, as long as 6= 0 and the amount

1

of overdisp ersion is an increasing function of and also, to a lesser extent, of . The

1 1

following prop osition establishes an expression for the auto correlation function of the ACP.

Prop osition 3.3 Auto correlation of the ACP1,1 Mo del. The unconditional au-

tocorrelation of the ACP1,1 model is given by

1  + 

1 1 1 1

s1

Corr[N ;N ]= +  :

t ts 1 1

2

2

1  +  +

1 1

1

Proof of Proposition 3.3. Pro of in app endix

It can b e shown that the result in Prop osition 3.3 holds also for the GDACP develop ed in

the next section and for b oth the exp onential and the Weibull versions of the ACD mo del

of Engle and Russell 1998.

For the ACP the likeliho o d is

l  =N ln   lny ! ;

t t t t t

where is a vector containing all the parameters of the autoregressive conditional intensity

and the conditional intensity  is autoregressive as in 3.2. The score and Hessian take the

t

very simple form:

@l N  @

t t t t

= ;

@  @

t

2 2

N @  @ l

t t t

= ;

2

2 2

@ @



t

where

p

X

@ @

0

t t1

= z + ; 3.3

t

t

@ @

t=1

and

z =[1;N ;N ;::: ;N ; ; ;::: ;  ] : 3.4

t t1 t2 tp t1 t2 q tq

The score can easily be seen to be equal to zero in exp ectation. It is of interest in this

mo del to test whether there is signi cant auto correlation. In most time series applications

a massive rejection can b e exp ected. This can b e done very simply with a likeliho o d ratio 6

test LR. The will b e equal to twice the di erence b etween the unrestricted and the

2

restricted likeliho o ds, which follows the usual distribution with two degrees of freedom.

Both the restricted and the unrestricted mo dels are easy to estimate. The simplicity of

this test contrast sharply with the diculties asso ciated with tests and estimation of the

auto correlation of a latentvariable in the mo del of Zeger 1988 whichhave b een recently

addressed in Davis and Wang 2000.

4 The DACP Mo del

This section intro duces two mo dels based on the double Poisson distribution, which di er

only by their . The DACP1 has its variance prop ortional to the the mean,

whereas the DACP2 has a quadratic variance function.

In some cases one mightwant to break the link b etween overdisp ersion and serial corre-

lation. It is quite probable that the overdisp ersion in the data is not attributable solely to

the auto correlation, but also to other factors, for instance unobserved heterogeneity. It is

also p ossible that the amount of overdisp ersion in the data is less than the overdipsersion

resulting from the auto correlation, in which case an underdisp ersed marginal distribution

might b e appropriate. In order to account for these p ossibilities the Poisson is replaced by

the double Poisson, intro duced by Efron 1986 in the regression context, which is a natural

extension of the Poisson mo del and allows one to break the equality between conditional

mean and variance. This density is obtained as an exp onential combination with parameter

of the Poisson density of the observation y with mean  and of the Poisson with mean

equal to the observation y , which can b e thought of as the likeliho o d function taken at its

maximum value.

The density of the Double Poisson is:

y

y y

1

e e y



2

f y ; ; = : 4.1 e

y ! y

f y ; ;  is not strictly sp eaking a density, since the probabilities don't add up to 1, but

Efron 1986 shows that the value of the multiplicative constant c; which makes it into

a real densityisvery close to 1 and varies little across values of the dep endentvariable. He

also suggests an approximation for this constant:

1 1 1

=1+ 1+ :

c;  12 

As a consequence, he suggests maximising the approximate likeliho o d leaving out the

highly nonlinear multiplicative constant in order to nd the parameters and using the

correction factor when making probability statements using the density.

The advantages of using this distribution are that it can be b oth under- and overdis-

p ersed, dep ending on whether is smaller or larger than 1. This will prove particularly

useful in a later section of this pap er, when two separate pro cesses are used for the variance

and mean, b ecause this density ensures that there is no p ossibility that the conditional 7

mean b ecome larger than the conditional variance, which would not be allowable with

strictly overdisp ersed distributions. In the case of the Double Poisson DP hereafter, the

distributional assumption 3.1 is replaced by the following:

N jF = DP  ;  : 4.2

t t1 t

It is shown in Efron 1986 Fact 2 that the mean of the Double Poisson is  and that the



variance is approximately equal to . Efron 1986 shows that this approximation is highly

accurate, and it will b e used in the more general sp eci cations.

For the simplest mo del, called the DACP1, the variance is a multiple of the mean:



t

2

V [N jF ]= = : 4.3

t t1

t

The co ecient of the conditional mean will b e a parameter of interest, as values di erent

from 1 will represent departures from the Poisson distribution. The following prop osition

gives an expression for the variance of the DACP1.

Prop osition 4.1 Unconditional variance of the DACP11,1 Mo del. The uncon-

ditional variance of the DACP11,1 model, when the conditional mean is given by

E [N jF ]= = ! + N +  ;

t t1 t 1 1 t1 1 t1

is equal to

2 2

1 1  +  + 

1 1

2

1

V [N ]= =  :

t

2

1  + 

1 1

Proof of Proposition 4.1. Pro of in app endix

Prop osition 4.1 shows that, like the ACP, the DACP1 mo del exhibits overdisp ersion. Re-

sults for the ACP1,1 are obtained by setting =1. The amount of overdisp ersion is an

increasing function of and also, to a lesser extent, of . When is zero, the overdis-

1 1 1

p ersion of the mo del comes exclusively from the Double Poisson distribution and is purely

a function of the parameter , a measure of the departure from the Poisson distribution.

The overdisp ersion can be seen to be a pro duct of two terms which can be interpreted

as the overdisp ersion due to the auto correlation and the overdisp ersion of the marginal

distribution, which is due to other factors:

2 2 2

 1 1  +  +

1 1

1

= :

2

 1  + 

1 1

Another p opular way of parameterising the disp ersion is to allow for a quadratic relation

between variance and mean, and this, along with the distributional assumption 4.2 de nes

the DACP2: 8

2 2

V [N jF ]= =  +  : 4.4

t t1 t

t t

This parameterisation can b e used along with the expression for the variance of the double



1

t

Poisson, replacing in 4.1 with = . One p otential problem with this sp eci cation,

 1+

t t

however, is that even though it can accommo date some underdisp ersion when  < 0, it is

1

then not p ossible to exclude negativevalues of the variance when  < . For this reason,

t



when underdisp ersion is observed, the DACP1 should b e preferred.

In this case the unconditional variance takes a somewhat di erent form, which is the ob ject

of the following prop osition.

Prop osition 4.2 Unconditional variance of the DACP21,1 Mo del. The uncon-

ditional variance of the DACP21,1 model, when the conditional mean is given by

E [N jF ]= = ! + N +  ;

t t1 t 1 1 t1 1 t1

is equal to

2 2

1 +  1  +  +

1 1

2 1

 : V [N ]= =

t

2

2

 +  1 

1 1

1

Proof of Proposition 4.2. Pro of in app endix

This shows that the DACP2 is overdisp ersed in general when  > 0, which is the case

considered in this pap er and that overdisp ersion is an increasing function of the parameters

of the mean equation and of the disp ersion parameter. The two e ects cannot b e separated

here as they could in the case of the DACP1.

As mentioned earlier on, the auto correlation of b oth versions of the DACP is identical to

the one of the ACP, which that the disp ersion prop erties of the marginal densitydo

not a ect the time series prop erties of the mo del.

For the DACP1 the likeliho o d is:

1   N

t t t

l  ; = ln   + N ln N  1 ln N ! + 1+ln ;

t t t t

2 N

t

where is a vector containing all the parameters of the autoregressive conditional intensity

and the conditional intensity  is autoregressive:

t

p q

X X

E [N jF ]= = ! + N +  :

t t1 t j tj j tj

j =1 j =1

Di erentiating with resp ect to and , 9

N  @ @l

t t t t

=

@  @

t

1 @l 1 

t t

: = +N  +N ln

t t t

@ 2 N

t

The rst condition with resp ect to is just the ACP condition multiplied by the disp ersion

parameter . Taking the exp ectation, one gets:

 

@l

t

E =0 :

@

The DACP1 is therefore a quasi maximum likeliho o d estimator: it is consistent as long as

the mean is correctly sp eci ed, even if the true density is not double Poisson. The Hessain

is given by:

2 2

N @  @ l

t t t

=

0

2

2

@



@ @

t

2

@ l N  @

t t t t

=

@ @  @

t

2

@ l 1 1

t

= ;

2 2

@ 2

@

t

and z are as in 3.3 and 3.4. where

t

@

Taking the exp ectation of the cross-derivative, one gets:

 

2

@ l

t

=0 : E

@ @

The cross derivative has an exp ectation of zero, so the exp ected Hessian is a blo ck-diagonal

matrix, which means that it is ecient to estimate indep endently from and that the

variance of the estimators of the mean and disp ersion are just the inverse of the diagonal

elements of the Hessian.

For the DACP2 the likeliho o d is more complicated. It can b e obtained from the likeli-

ho o d of the DACP1 by the following transformation:

L  ; =l  ;   ; :

t t t



1

t

+ N ln  N  1 ln  N ! L  ; = ln  1 +  

t t t t t

2 1+

t

N 

t t

+ 1+ln ;

1+ N

t t 10

where is a vector containing all the parameters of the autoregressive conditional intensity

and the conditional intensity  is autoregressive:

t

p q

X X

E [N jF ]= = ! + N + 

t t1 t j tj j tj

j =1 j =1

Di erentiating with resp ect to and ,

@L 1 N  1  @

t t t t

= + 1+N 1+ln

t

@ 1+ 2  1+ N @

t t t t

@L  1 1 

t t t

=  + N 1+ln :

t t

@ 1+ 2 1+ N

t t t

It can b e seen, by setting  equal to zero in the score with resp ect to the parameters of the

mean, that one gets back the rst order condition of the Poisson mo del:

@L N  @

t t t t

= :

@  @

t

2 2

1 N 2  @  @ L 1

t t t t

+  + 1+N 1+ln =

t

0

2

2

2  1+ N @

@ @

1 +  

t t t

t

2 2 2

@ L 1 @ 1   1 

t t t

t

= + N + N 1+ln

t t

2 2

@ @ 2 N @

1 +   1 +  

t

t t

2

2

@ L 1 2  

t t t

+  + N 1+ln =

t t

2

@ 1+ 2 1+ N

t t t

@

t

and z are as in 3.3 and 3.4. Again

t

@

In these mo dels excess overdisp ersion can be tested with a simple for =

1 in the DACP1 and for  = 0 in the DACP2. Auto correlation can be tested with a

likeliho o d ratio test of a mo del in which the autoregressive parameters of the conditional

mean are restricted to be zero against a more general alternative. One p otential problem

may lie in the fact that the double Poisson mo dels are estimated by approximate maximum

likeliho o d. The test is really an approximate likeliho o d ratio where the approximation error

c ; 

u u

log is assumed to b e close to zero.

c ; 

r r

In this section the equidisp ersion assumption of the Poisson mo del was relaxed and

replaced it bytwo alternative sp eci cations of the mean variance relation. In the rst one,

the conditional variance was allowed to beamultiple of the conditional mean, whereas in

the DACP2 mo del, the variance was a quadratic function of the mean. In b oth of these

formulations, however, the mean and variance are b ound to covary in a rather restricted

way. In the DACP1, by construction, the mean and variance have a correlation of 1. In

the DACP2 this isn't the case, but mean and variance have to move together. Whenever

the mean increases, the variance increases as well. This might b e to o restrictive in certain 11

situations, for example with a rare disease where one do es not really know whether a disease

is spreading out or whether the large numb er of cases was just an isolated o ccurrence. When

the number of cases is small it tends to be followed by small counts and b oth mean and

variance are then small. On the contrary, a larger count can b e followed either bya large

count the disease is spreading out or by a small count, in which case the mean do es not

change much, but the variance increases signi cantly as a result of this uncertainty. In

order to deal with series of that sort, more exible mo dels have to b e considered.

5 Extension to time-varying variance

This section intro duces two di erenttyp es of generalisations of the double Poisson mo dels.

The rst, which will be called Generalised DACP GDACP, adds a GARCH variance

function to the DACP. The second typ e of extension consists of mo deling the disp ersion

parameters of the DACP1 and DACP2 as autoregressivevariables and the resulting mo dels

are called the Generalised DACP1 GDACP1 and Generalised DACP2 GDACP2.

The advantage of the Double Poisson distribution is that it allows for separate mo dels

of the mean and of the disp ersion. In the regression applications of Efron 1986, the mean

is made to dep end on one set of regressors and the disp ersion dep ends on a di erent set, via

a logistic link function. This is a natural parameterisation for the cases most often analysed

in the literature, where certain variables are thought to a ect the disp ersion of

the dep endentvariable. For example, in the toxoplasmosis data analysed by Efron 1986,

whichinvolves a double logistic, the dep endentvariable is the p ercentage of p eople a ected

by the disease in every city, the annual rainfall in every city is the explanatory variable and

the sample size in each city determines the amountofoverdisp ersion. However, in the time

series case which is the fo cus of this pap er, there are in general no suchvariables. This is

obviously true for univariate time series but it is also the case for many other applications.

In the univariate time series context disp ersion will b e mo deled as a function of the past

of the series. Two alternative sp eci cations will b e prop osed, one in terms of the variance

and the other in terms of the disp ersion parameter.

The mo del which is written in terms of the variance will be examined rst. This is

a natural way to think ab out this problem since in the mo deling of economic time series,

based implicitly on thinking ab out the normal distribution, the fo cus has b een more on

mo dels of the variance than on mo dels of the disp ersion. Moreover in terms of writing an

autoregressive mo del of disp ersion, it is not obvious what the lagged variable should b e. As

a consequence, it will also b e dicult to calculate the theoretical unconditional moments of

the mo del. In the case of variance however, one can adopt GARCH typ e variance functions

and cho ose conditional variance as the lagged variable. With a GARCH pro cess for the

conditional variance, the disp ersion parameter of the double Poisson can be expressed as

a function of b oth the conditional mean and variance. The mean sp eci cation will b e the

same as in 3.2, but in addition, the conditional variance will b e mo deled as:

 

2

2

h E  jF = ! +  N   + h : 5.1

t t1 2 2 t1 t1 2 t1

t

From here on co ecients app earing in the conditional mean are indexed by 1 and co ecients

of the conditional variance pro cess are indexed by2. For this sp eci cation, the disp ersion 12

co ecient of the simple double Poisson is expressed in terms of the conditional mean and

variance of the pro cess:



t

= : 5.2

t

h

t

The following prop osition gives an expression for the unconditional variance of the

GDACP when b oth the conditional mean and variance have one autoregressive and one

moving average parameter.

Prop osition 5.1 Unconditional variance of the GDACP1,1,1,1 Mo del. The vari-

ance of the generalised model is given by:

2 2

! 1  +  + 

2 1 1

2

1

V [N ]= = :

t

2

1  1  +  

2 2 1 1

Proof of Proposition 5.1. Pro of in app endix

The variance is a pro duct of the unconditional mean of the variance pro cess and a term

dep ending on the mean equation parameters. Some insight can b e gained by dividing b oth

sides of this equation by the unconditional mean of the counts and making use of 3.2:

2 2 2

! ! 1  +  +  

2 1 1 1

1

= :

 1  1 + + 

2 2 1 1

It is now apparent that the overdisp ersion of the GDACP is a pro duct of the long term mean

of the variance pro cess and of a term which is decreasing in and . This somewhat

1 1

counterintuitive result is due to the fact that the variance pro cess gives rise to most of

the overdisp ersion and that the autoregressive parameters of the mean, while increasing

the unconditional variance, actually increase the mean even more for a given ! , whichin

1

aggregate decreases the overdisp ersion.

It is of interest to see how the GDACP relates to the mo dels presented in the previous

sections. The GDACP mo del nests the plain double Poisson obtained when = =

1 1

! ! !

2 1 2

in the case of the DACP1 and  = = = 0 with = in the case of the

2 2

2

!

!

1

1

DACP2 and the plain Poisson obtained when = = = = 0 and ! = ! , but it

1 1 2 2 2 1

do es not nest non-trivial ACP or DACP mo dels where 6= 0 and 6= 0. A consequence

1 1

of this is that it is not p ossible to test the GDACP against the previous mo dels with a test

of nested hyp otheses. One would have to resort to tests of non-nested hyp otheses, which

in the present case would b e quite cumb ersome, due to the fact that the conditional mean

and variance are unobserved autoregressive pro cesses.

The likeliho o d function of the GDACP is given b elow:

2

  1  N 

t t t t

t

l  ;  = ln + N ln N  1 ln N ! + 1+ln :

t 1 2 t t t

2 h h h N

t t t t

where  and h are as de ned ab ove and where =[! ; ; ] and =[! ; ; ].

t t 1 1 1 1 2 2 2 2 13

The score of the mo del is given by

1 N  N  @ @l

t t t t t t

= +2 + ln

@ 2 h h N @

t t t t t 1

@l 1  @h N   N 

t t t t t t t t

= ln :

2

@h 2h h h N @

h

t t t t t 2

t

The Hessian is

2 2

@l N @  1 2

t t

t

+ =

0 0

2

h  h

2

@ @ @

1 t t

1 1

t

1 1

2

2 @ N  @h @l

t t t t t

t

= 2+ln

0 0

2 2

N @

h h

@ @

t 1

1

t t

2 2

2 2

@l 1  @ h N 2 N 

t t t t t t

t

= : + ln

0 0

2 2

h h N

2h h

@ @ @ @

t t t

2 2

t t

2 2

The GDACP mo del is convenient b ecause it has many features which make it familiar

for time series econometricians, however it do es not nest simpler mo dels like the DACP. This

makes testing the general mo del against the more restrictive ones dicult. To remedy this,

one can mo del the disp ersion parameter of the double Poisson directly. Unlike the variance,

when mo deling time-varying disp ersion, there is no natural candidate lagged dep endent

variable. As a consequence a logistic link function will b e used as in Efron 1986:

M

= ; 5.3

t

1 + exp  

t

along with an autoregressive pro cess for :

N 

t1 t1

 = ! + +  :

p

t 2 2 2 t1



t t

M is the maximum p ossible overdisp ersion, which, as in Efron 1986, will not b e estimated

but set by trial and error. In all the applications it will b e set to 10. Unlike in the previous

mo del, the unconditional moments are not easy to compute explicitly. In the present

formulation, however, the regressor is a martingale di erence sequence which makes  a

t

stationary ARMA pro cess. It can be seen that this disp ersion mo del nests the DACP1

when = = 0 and the ACP when in addition, ! = log M 1, but not the DACP2.

2 2 2

It is p ossible however to build a mo del which nests the DACP2 by setting  de ned in

4.4 to be a function of  as in 5.3. These mo dels will be called GDACP1 and GDACP2

resp ectively by reference to the DACP1 and DACP2 mo dels that they generalise. Table

1 contains a summary of the mo del sp eci cations and the chart on page 29 shows how

the various mo dels relate to each other. Testing the time-varying overdisp ersion in the

GDACP1 and GDACP2 can be done with a likeliho o d test against the null hyp othesis of

DACP1 and DACP2 resp ectively. Gurmu and Trivedi 1993 consider arti cial regression 14

tests for disp ersion mo dels which could easily b e generalised to time-varying overdisp ersion.

These tests would then b e very similar to GARCH tests with the exception that they are

based on the deviation instead of the residuals.

An advantage of the double Poisson over anyoverdisp ersed densities alluded to earlier

on b ecomes more obvious now. The double Poisson distribution can be either over- or

underdisp ersed, which means that the conditional variance can b e either larger or smaller

than the conditional mean. With a strictly overdisp ersed distribution, there would be

diculies each time the mean gets larger than the variance in the GDACP or when the

disp ersion gets smaller than 1 for the GDACP1 or GDACP2, which could happ en in the

numerical optimisation of the likeliho o d or when making out-of-sample . There is

no constraint on the parameters of the mo del which could ensure that this do es not happ en.

With the double Poisson, however, there is no need to worry ab out this p ossibility.

The mo dels can easily be generalised to include explanatory variables. It is p ossible

to multiply the conditional mean, variance or overdisp ersion by a function of exogenous

regressors in exactly the same way as in Zeger 1988. In the case of the Poisson, this

can b e done by using the natural link function, which is the exp onential. Instead of using

directly  and h , one could use  exp X and h exp X as the new conditional mean

t t t 1 t 2

and variance. Another p ossibilityisto include regressors directly in the conditional mean

or variance equation. Obviously the mo del prop osed here can also b e extended to most of

the variations of the GARCH family, whichisavery rich class of mo dels.

6 Data analysis

In order to demonstrate how the mo del works, it is applied to two datasets in completely

di erent areas. The rst is a medical example and the second is an application to sto ck

market data.

6.1 Incidence of p oliomyelitis in the U.S.

The rst application of the mo del is to the monthly number of cases of p oliomyelitis in

the United States between 1970 and 1983, which was analysed as an example in Zeger

1988. Zeger estimated three mo dels, two of which assume that rep eated observations are

indep endent and follow a negative binomial distribution in one case or have a constant

co ecientofvariation in the other case. The mo del prop osed by Zeger considers that there

is a latent pro cess which generates overdisp ersion and auto correlation. The time series of

p olio cases seems to have b ecome a b enchmark for time series count mo dels. It has b een

analysed by Brannas and Johansson 1994, who study prop erties of di erent estimators

for the parameters of the latent variable in the Zeger mo del which are resp onsible for

overdisp ersion and autoregression. Davis and Wang 2000 prop ose a test for the presence

of a latentvariable and a correction for the standard errors of the co ecients on exogenous

regressors based on asymptotic theory. They apply their metho d to the p olio series to show

that their theoretical standard errors are close to simulated ones. Jorgensen, Lundbye-

Christensen, Song, and Sun 1999 prop ose a nonstationary state space mo del and apply

it to the same dataset. Fahrmeir and Tutz 1994 rep orts results from the estimation of a

log-linear Poisson mo del which includes lagged dep endentvariables. 15

Up on examining the series of p olio cases see Figure 2, it is not at all obvious whether

this dataset provides evidence of declining incidence of p olio in the U.S. which is one of

the questions of interest. The reveals that, with the exception of an outlier at

14 which could well b e a recording error, the counts are very small with more than 60 of

zeros. The series is clearly overdisp ersed with a mean of 1:33 and a variance of 3:50. Figure

3 shows the auto of the number of p olio cases after an outlier has b een taken

away. There is signi cant auto correlation in levels and squares, and there is a clear pattern

in the sign of the auto correlation, which is rst p ositive, then negative and p ositive again

at very high lags.

Results from mo dels without exogenous regressors are rep orted in table 2. The outlier

of 14 has b een taken out, as it had a strong impact on some of the co ecients. Mo dels are

evaluated on the basis of their log-likeliho o d, but also on the basis of their Pearson residuals,

N 

1

t t

p

which are de ned as:  = , where a correction for the number of degrees of

t



t

T K

freedom was used. If a mo del is well sp eci ed, the Pearson residuals will have variance

one and no signi cant auto correlation left. Auto correlation is tested with a likeliho o d ratio

2

test LR which gives a test statistic of 57:4 much larger than the 5 value of 5:99.

[2]

The simplest mo del is the ACP which, while providing a good t and capturing some

auto correlation, leaves an overdisp ersed residual with a variance of 1:70. The DACP1 and

DACP2 correct for this, reducing this variance to 1:05 and :96 resp ectively. Both a Wald

test, which in this case is the same as a simple t-test of the null that = 0 in the DACP1

or  = 0 in the DACP2, and a likeliho o d ratio LR test reject the ACP mo del. As was

to be exp ected, the double Poisson makes it p ossible to t the auto correlation and the

overdisp ersion at the same time.

Most parameters are signi cant at the 5 level. The parameters of the conditional

mean for the three mo dels are very similar, which indicates that the ACP was adequately

capturing the time series asp ect of the data while missing some of the disp ersion. They

imply a relatively high degree of p ersistence with the sum of the autoregressive parameters

around :75 for most mo dels. Amongst the more general mo dels the GDACP2 p erforms

b est in terms of the disp ersion, since it leaves a standard residual with a variance of 1:00,

but the GDACP1 seems to provide a slightly b etter t. The GDACP1 is preferred to the

DACP1 that it nests as shown by a LR test statistic of 8, but it is not p ossible to reject the

DACP2 against the more general GDACP2 the LR test statistic is :8. For an overview of

the mo dels, see table 1 and the chart on page 29. The mo del which is formulated with a

GARCH variance function do es not p erform as well as the disp ersion mo dels.

Figure 4 shows the auto correlations of the Pearson residuals for the mo dels considered

so far. There is a great overall reduction in the level of auto correlation b elow the Bartlett

con dence interval which lies at :154, except for very few values. The Ljung-Box statistic for

auto correlation which do es not allow to reject the null hyp othesis of zero auto correlations

for any lags in the original series is reversed for all lags for the standardised residuals of

the mo dels. The pattern observed in the auto correlogram of the series has b een replaced

by a pattern at a higher . It can be exp ected that this pattern will disapp ear

with the inclusion of seasonality dummies. Figure 5 reveals that all mo dels havevery small

auto correlations in squares and have lost the low frequency pattern that is present in the

original series.

In order to compare the results to the results of Zeger 1988, mo dels using the same set 16

of regressors were estimated. These include trigonometric seasonality variables at yearly

and half yearly frequencies as well as a time trend, since interest lies in whether or not the

present dataset provides evidence of declining incidence of p olio in the U.S. The results,

shown in table 3 are qualitatively similar to Zeger's results. In all the mo dels the co ecient

on the time trend is negative and insigni cant. This co ecient is systematically smaller in

absolute value across all the sp eci cations than what Zeger rep orts. The co ecients of the

seasonalityvariables are more or less the same as in Zeger 1988. The results seem to be

even closer to what Fahrmeir and Tutz 1994 rep ort in Table 6.2 than to Zeger's results.

Magnitudes vary a little but the signs are always the same. In all mo dels, the seasonality

variables are signi cant as a group. Again auto correlation is tested and the null of no serial

correlation is rejected with a test statistic of 24 very much in excess of the 5:99 value under

the null. The overall picture for the mo dels including exogenous regressors is the same as for

the ones which do not. LR tests reject the Poisson mo del in favour of the double Poisson.

The DACP2 is the b est amongst the simpler mo dels b oth in terms of the likeliho o d and in

the mo deling of disp ersion, since it leaves an almost p erfectly equidisp ersed residual. The

GDACP1 ts b etter, but do es slightly worse than the DACP2 in terms of the residuals. A

LR test rejects the DACP1 in favour of the GDACP1, but it is not p ossible to reject the

DACP2 against the alternativehyp othesis of GDACP2 at the 5 level.

The auto correlations of the standard errors are shown in gure 6 and 7 and they are

quite similar to the previous ones. This is somewhat of a surprise, as it could b e exp ected

that what seems to b e a systematic seasonal pattern in the auto correlations would disapp ear

after inclusion of seasonalityvariables. Mayb e this apparent pattern is not very imp ortant,

since it is clearly b elow the Bartlett signi cance level and insigni cant also according to

the Ljung-Box statistic. Alternatively it could suggest that the seasonality has not b een

taken into account prop erly, but a little exp erimenting with alternative parameterisations

suggests that this is not the case. This example shows that the ACP family of mo dels is

able to capture auto correlation and disp ersion successfully and whiten the residuals, while

getting similar results as Zeger 1988 in terms of co ecients on exogenous regressors and

their standard errors.

6.2 Daily number of price changes

The second application of the mo del that is to the daily numb er of price change durations

of :75$ on the IBM sto ck on the New York Sto ck Exchange. A :75$ price-change duration is

de ned as the time it takes the sto ck price to moveby at least :75$. The variable of interest

is the daily numb er of such durations, which is a measure of intradaily volatility, since the

more volatile the sto ck price is within a day, the more often it will cross a given threshold

and the larger the counts will b e. Midquotes from the Trades and Quotes TAQ dataset

will b e used to compute the numb er of times a day that the price moves by at least :75$,

using the " ve second rule" of Lee and Ready 1991 to compute midquotes prevailing at

the time of the trade. For robustness it is required that the following midquote not revert

the price change.

Let us denote by S the midquote price of the asset and by  the times at which the

t n

threshold is crossed: 17

 = inf ft> : jS S jdg 6.1

n+1 n t 

n

t

The durations are then de ned as t =   and the ob ject of interest is the daily

n n+1 n

numb er of such durations, which is a measure of intradaily volatility. Unlike the volatility

that can b e extracted from daily returns, for example with GARCH, the counts are a daily

measure based on the price history within a day and this will contain information that

volatility measures based on daily data are missing. For such a series interest lies amongst

other things in forecasting, as volatility is an essential indicator of market b ehaviour as well

as an input in many asset pricing problems.

As can b e seen from the plotted series in Figure 9, the counts have episo des of high and

low mean as well as variance, which suggest that autoregressive mo deling should b e appro-

priate. The histogram reveals that the data from 0 to 30. The data is overdisp ersed

with a mean of 5:98 and a variance of 20:4. Also, Figure 8 shows that the auto correlations

of the series in level and squares are clearly signi cant up to a relatively large number

of lags the Bartlett's 95 con dence interval under the null of iid is :10. Signi cant

auto correlation is also present in the third and fourth p owers, although to a lesser degree.

A series of mo dels is estimated, which range from the simple ACP to the most general

GDACP and the results are rep orted in table 4. Most parameters are signi cant with

t-statistics well ab ove 2. All the mo dels imply that the series is quite p ersistent with

+ in the order of :82 to :94, which . The GDACP implies that the variance is

1 1

also very p ersistent with a value of :912 for the sum of the autoregressive parameters.

Again the simple Poisson mo del is rejected in favour of the DACP1 and DACP2. The

DACP2 has standardised errors with a variance very near 1. The mo del in variance do es

not p erform well, either in likeliho o d or in terms of its standard errors. The DACP1 is

rejected in favour of the GDACP1, but the DACP2 cannot be rejected at the 5  level.

The preferred mo del is then DACP2 which has the b est likeliho o d of all the mo dels and

which leaves almost an equidisp ersed error term. Another way to check the sp eci cation

is to lo ok at the auto correlation of the Pearson residuals, which should be if

the time series dep endence has b een well accounted for by the mo del. In Figure 10 shows

the auto correlogram of the standardised errors of the various mo dels. This reveals that the

standard errors have no more auto correlation left and they have lost any of the systematic

patterns that were present in the original series. Figure 11 shows that the same is true for

the squared standard errors. The auto correlogram of the residuals to the third and fourth

power show very low auto correlation, which indicates that the mo dels capture the serial

correlation in the four rst moments well.

The mo dels are also evaluated with resp ect to the quality of their out-of-sample fore-

casts. The mo dels are estimated on a starting sample of 202 observations, then a one-

step-ahead forecast is calulated and the mo del is reestimated for every p erio d with all the

available information. First the quality of the p oint forecasts from each one of the mo dels

is evaluated. The Ro ot Mean Squared Errors RMSE of the various mo dels are quite close

to each other. The mo dels which do b est are the DACP2 and GDACP2, but the ACP

and he DACP1 do almost as well. In terms of the Standardised Sum of Errors the DACP2

p erforms b est with a variance of 1:29, the closest to one from all mo dels. A decomp osi-

tion of the forecasting error shows that the bias is very small in most mo dels which means 18

that the forecasts are right on average. The variance prop ortion of the forecast for most

mo dels is around :25 which means that the variance of the forecast and the variance of

the original are quite close to each other. The remaining part of the forecasting error is

unsystematic forecasting error. It is a good sign for the forecasts that most of the error

is of the unsystematic typ e. Using that measure, the DACP2 is again seen to b e the b est

with 77 of the forecasting error b eing of the unsystematic typ e.

Another way to test the accuracy of the mo dels is to evaluate how good they are at

forecasting the density of the counts out-of-sample. For this purp ose the metho d develop ed

by Dieb old and Tay 1997 is used, which consists in computing the cumulative probability

of the observed values under the forecast distribution. If the density from the mo del is

accurate, these values will b e uniformly distributed and will have no signi cant auto corre-

lation left neither in level nor when raised to integer powers. In order to assess how close

the distribution of the Z variable is to a uniform, quantile plots of Z against quantiles of

the uniform distribution are shown. The closer the plot is to a 45-line, the closer the

distribution is to a uniform. The quantile plots of the various mo dels are shown in gure

12. The Z statistic for most mo dels is quite close to the 45-line. The GDACP1 is the

mo del which p erforms b est. The Poisson mo del gives to o little weight to large observations

as is re ected in the fact that the curve is clearly b elow the 45-line between :6 and 1.

This is present to a certain degree in all of the plots, but less so for more sophisticated

mo dels. The ACP and GDACP1 give to o little weight to zeros whereas all other mo dels

attribute to o much probability mass to them. All the mo dels seem to have diculty with

the right tail: they cannot quite accommo date as many large values as are present in the

data. Dieb old and Tay 1997 prop ose to graphically insp ect the correlogram along with

the usual Bartlett con dence intervals. For the present case, this means that all corre-

lations smaller in absolute value than :141 can be considered not to be signi cant. The

auto correlations for the mo dels are displayed in gures 13 and 14. In general the mo dels

p erform very well with only very few signi cant correlations left. It is dicult to discrimi-

nate between mo dels base on these auto . It seems though that the GDACP1

has large negative auto correlations for small lags, b oth in levels and in squares. In terms

of the Ljung-Box, which has acceptable prop erties according to the Monte Carlo study in

Brannas and Johansson 1994, the GDACP and GDACP2 have the highest p-values for

levels and squares, followed by the DACP1. The GDACP which do es quite p o orly in-sample

seems to capture the time-series adequately out-of-sample.

7 Conclusion

This pap er intro duces new mo dels for time series count data. These mo dels have proved very

exible and easy to estimate. They make it p ossible to correct standard errors and improve

inference on parameters of exogenous regressors compared to Static Poisson regression

like other metho ds, particularly the one prop osed by Zeger 1988, which has attracted

a lot of attention recently, while mo deling the time series dep endence in a more exible

way. It is shown that these mo dels p erform well and can explain b oth auto correlation and

disp ersion in the data. The biggest advantage of this framework is that it is p ossible to

apply straightforward likeliho o d based tests for auto correlation or overdisp ersion. Finally 19

this metho d also makes it p ossible to p erform p oint and density forecasts, which successfully

pass a series of forecast evaluation tests. An interesting question is how this mo del can b e

generalised in a multivariate framework and this is the ob ject of ongoing research.

8 App endix

Proof of Proposition 3.1. Same as the pro of of lemma 1 in Engle and Russell 1998.

Proof of Proposition 4.1. Up on substitution of the mean equation in the autoregressive

intensity, one obtains:

  = N +   ;

t 1 t1 1 t1

  = N  + +   :

t 1 t1 t1 1 1 t1

Squaring and taking the exp ectation gives:

   

2 2 2 2 2

E [  ]= E N   + +  E   :

t t1 t1 1 1 t1

1

Using the law of iterated exp ectations and substituting the conditional variance  for

t1

its expression, one gets:

   



t1

2 2 2 2

]+ +  E   : 8.1 E   = E [

1 1 t1 t

1

Collecting terms, one gets:

2



2 1

V [ ]=E [  ]= : 8.2

t t

2

1  +  

1 1

Now, applying the following prop erty on conditional variance

   

E y jx ; 8.3 V y jx + V V [y ]=E

x x

y jx y jx

to the counts, one obtains:

     

2 2 2

E N  = E N   + E   : 8.4

t t t t

Again using the law of iterated exp ectations, substituting the conditional variance  for

t

its expression, then making use of the previous result, and after nally collecting terms,

one gets the announced result. 20

Proof of Proposition 4.2. This pro of is similar to the pro of of Prop osition 4.1. When sub-

2

stituting the conditional variance  for its expression, instead of 8.1, one gets:

t

   

2 2 2 2 2

E   = E [ +  ]+ +  E   :

t t1 1 1 t1

1 t1

Similarly, instead of 8.2, one gets:

2

1 + 

2

1

: V [ ]=E [  ]=

t t

2

2

1   + 

1 1

1

Now, applying 8.3 to the counts, one obtains 8.4. The result then follows after applying

the same steps as in the previous pro of.

Proof of Proposition 3.3. As a consequence of the martingale prop erty, deviations b etween

the time t value of the dep endentvariable and the conditional mean are indep endent from

the information set at time t. Therefore:

E [N   ]=0 8 s  0 :

t t ts

By distributing N  , one gets:

t t

Cov[N ; ]=Cov[ ; ] 8 s  0 : 8.5

t ts t ts

By the same "non-anticipation" condition used ab ove, it must b e true that:

E [N  N ] = 0 8 s  0 :

t t ts

Again, distributing N  , one gets:

t t

Cov[N ;N ]=Cov[ ;N ] 8 s  0 : 8.6

t ts t ts

Now,

Cov[ ; ]= Cov[N ; ]+ Cov[ ; ]

t ts 1 t ts+1 1 t ts

= + Cov[ ; ]

1 1 t ts

s

= +  V [ ] :

1 1 t

The rst line was obtained by replacing  by its expression, the second line by making use

t

of 8.5, the last line follows from iterating line two.

Cov[ ; ]= Cov[ ;N ]+ Cov[ ; ] :

t ts+1 1 t ts 1 t ts

Rearranging and making use of 8.6, one gets: 21

1 1

Cov[ ; ] Cov[ ; ] Cov[N ;N ]=

t ts+1 t ts t ts

1

1

s

= 1  +   +  V [ ] :

1 1 t

1

Finally,

1 V [ ]

t

s

Corr[N ;N ]= 1  +   +  :

t ts 1 1 1 1 1

V [N ]

1 t

Replacing V [ ] and V [N ]by their resp ectivevalues, the result follows.

t t

Proof of Proposition 5.1. This prop osition can b e proved by pro ceeding in a similar fashion

as in the Pro of of 3.2, but making use of the expression for the conditional variance h . As

t

a result of the rst step, instead of 8.2, one obtains:

2

E [h ]

t1

2

1

: 8.7 V [ ]=E [  ]=

t t

2

1  + 

1 1

It can b e seen from 5.1 that:

!

2

: 8.8 E [h ]=

t

1

2 2

Applying prop erty 8.3, making use of 8.8 and after simpli cation, one gets the announced

result. 22

References

Bollerslev, Tim, 1986, Generalized autoregressive conditional heteroskedasticity, Journal of

Econometrics 52, 5{59.

, Rob ert F. Engle, and Daniel B. Nelson, 1994, Arch mo dels, in Rob ert F. Engle,

and Daniel L. McFadden, ed.: Handbook of , Volume 4 Elsevier Science:

Amsterdam, North-Holland.

Brannas, Kurt, and Per Johansson, 1994, Time series count data regression, Communica-

tions in Statistics: Theory and Methods 23, 2907{2925.

Cameron, A. Colin, and Pravin K. Trivedi, 1979, of Count Data Cam-

bridge University Press: Cambridge.

Cameron, Colin A., and Pravin K. Trivedi, 1996, Count data mo dels for nancial data,

in Maddala G.S., and C.R. Rao, ed.: Handbook of Statistics, Volume 14, Statistical

Methods in Finance Elsevier Science: Amsterdam, North-Holland.

Campb ell, M.J., 1994, Time series regression for counts: an investigation into the relation-

ship b etween sudden infant death syndrome and environmental temp erature, Journal

of the Royal Statistical Society A 157, 191{208.

Chang, Tiao J., M.L. Kavvas, and J.W. Delleur, 1984, Daily precipitation mo deling by

discrete autoregressivemoving average pro cesses, Water Resources Research 20, 565{

580.

Davis, Richard A., William Dunsmuir, and Yin Wang, 2000, On auto correlation in a p oais-

son regression mo del, Biometrika 87, 491{505.

Dieb old, Francis X., Todd A. Gunther, and Anthony S. Tay, 1997, Evaluating density

forecasts, U Penn Working Pap er.

Efron, Bradley, 1986, Double exp onential families and their use in generalized linear regres-

sion, Journal of the American Statistical Association 81, 709{721.

Engle, Rob ert, 1982, Autoregressive conditional heteroskedasticity with estimates of the

variance of U.K. in ation, Econometrica 50, 987{1008.

Engle, Rob ert F., and Je rey Russell, 1998, Autoregressive conditional duration: A new

mo del for irregularly spaced transaction data, Econometrica 66, 1127,1162.

Fahrmeir, Ludwig, and Gerhard Tutz, 1994, Multivariate Statistical Modeling Based on

Generalized Linear Models Springer-Verlag: New York.

Gurmu, Shiferaw, and Pravin K. Trivedi, 1993, Variable augmentation sp eci cation tests

in the exp onential family, Econometric Theory 9, 94{113.

Harvey, A.C., and C. Fernandes, 1989, Time series mo dels for count or qualitative obser-

vations, Journal of Business and Economic Statistics 7, 407{417.

Johansson, Per, 1996, Sp eed limitation and motorway casualties: a time series count data

regression approach, Accident Analysis and Prevention 28, 73{87.

Jorgensen, Bent, Soren Lundbye-Christensen, Peter Xue-Kun Song, and Li Sun, 1999, A

state space mo del for multivariate longitudinal count data, Biometrika 96, 169{181. 23

Lee, Charles M., and Mark J. Ready, 1991, Inferring trade direction from intraday data,

Journal of Finance 66, 733{746.

MacDonald, Iain L., and Walter Zucchini, 1997, Hidden Markov and Other Models for

Discrete-valued Time Series Chapman and Hall: London.

McKenzie, Ed., 1985, Some simple mo dels for discrete variate time series, Water Resources

Bul letin 21, 645{650.

Rydb erg, Tina H., and Neil Shephard, 1998, A mo deling framework for the prices and times

of trades on the nyse, To app ear in Nonlinear and nonstationary signal pro cessing

edited by W.J. Fitzgerald, R.L. Smith, A.T. Walden and P.C. Young. Cambridge

University Press, 2000.

, 1999a, Dynamics of trade-by-trade movements decomp osition and mo dels, Working

pap er, Nueld College, Oxford.

, 1999b, Mo delling trade-by-trade price movements of multiple assets using multi-

variate comp ound p oisson pro cesses, Working pap er, Nueld College, Oxford.

Zeger, Scott L., 1988, A regression mo del for time series of counts, Biometrika 75, 621{629.

, and Bahjat Qaqish, 1988, Markov regression mo dels for time series: A quasi-

likeliho o d approach, Biometrics 44, 1019{1031. 24

Mo del Mean Variance Disp ersion

2

 

t

t

ACP ! + N +   1

1 1 t1 1 t1 t



1

t

DACP1 ! + N + 

1 1 t1 1 t1

2

DACP2 ! + N +   +  1+

1 1 t1 1 t1 t t

t



2 t

GDACP ! + N +  ! + N + N  

1 1 t1 1 t1 2 2 t1 2 t t

2



t



M

t

GDACP1 ! + N +  =

1 1 t1 1 t1 t

1+exp  

t

t

N 

t1 t1

p

 = ! + + 

t 2 2 2 t1



t1 t1

M

2

 = GDACP2 ! + N +   +  

t 1 1 t1 1 t1 t t

t

1+exp  

t

N 

t1 t1

p

 = ! + + 

t 2 2 2 t1

2

 + 

t1 t1

t1

Table 1: Summary of the mo del sp eci cations 25

Co ecients ACP DACP1 DACP2 GDACP GDACP1 GDACP2

! .29 .28 .56 .12 .37 .34

1

2.46 1.49 2.00 .93 1.72 1.52

.23 .23 .36 .08 .28 .28

1

4.83 2.84 3.47 1.33 2.49 2.33

.55 .56 .21 .82 .44 .46

1

5.02 3.13 .92 5.31 2.09 2.12

! .20 -2.89 -3.45

2

1.35 -2.73 -1.22

.06 -.21 .26

2

1.51 -2.66 1.25

.84 -.09 -.12

2

10.5 -.24 -.13

disp ersion .62 .53

7.40 2.36

Log-L -261.8 -250.2 -247.8 -254.1 -246.2 -248.2

Variance of s.e 1.70 1.05 .96 1.11 1.02 1.00

Table 2: Results from the p olio dataset: no exogenous variables T-statistics are given in

parenthesis. 26

Co ecients ACP DACP1 DACP2 GDACP GDACP1 GDACP2 Zeger

! .15 .37 .14 .15 .19 .15

1

1.64 1.25 1.01 1.29 1.18 1.07

.16 .23 .17 .12 .17 .18

1

2.92 2.18 1.75 1.78 1.77 1.62

.73 .52 .72 .74 .70 .71

1

7.65 2.14 4.81 5.01 4.37 4.68

! 1.45 -2.72 -3.81

2

1.99 -2.18 -1.20

.30 -.21 .33

2

-2.01 -2.48 1.30

.51 -.07 -.13

2

2.90 -.14 -.14

trend -.0013 -.0021 -.0010 -.0003 -.0019 -.0014 -.0044

-1.05 -1.22 -.50 -.13 -.57 -.96 -1.62

cos12 -.26 -.24 -.29 -.02 -.18 -.25 -.11

-3.14 -1.89 -2.39 -.12 -1.51 -2.02 -.69

sin12 -.37 -.33 -.36 -.20 -.40 -.40 -.48

-3.60 -2.19 -2.65 -1.29 -2.55 -2.82 -2.82

cos6 .18 .14 .21 .02 .21 .22 .20

1.93 1.03 1.45 .14 1.55 1.52 1.43

sin6 -.40 -.41 -.38 -.20 -.40 -.40 -.41

-4.08 2.91 -2.68 -1.27 -1.96 -2.37 -2.93

disp ersion .69 .38

7.16 2.07

Log-L -246.5 -239.6 -238.2 -250.5 -235.9 -236.4

Variance of s.e 1.51 .98 1.01 1.11 1.04 1.18

Table 3: Results from the p olio dataset: exogenous variables T-statistics are given in

parenthesis. 27

Co ecients ACP DACP1 DACP2 GDACP GDACP1 GDACP2

! .79 .73 .69 .66 .82 .66

1

6.35 3.16 3.13 3.43 3.50 3.12

.42 .41 .41 .25 .39 .39

1

17.5 8.75 8.55 5.61 6.82 7.04

.45 .53 .48 .57 .47 .50

1

13.2 13.22 7.34 7.62 6.37 7.35

! 1.07 -3.76 1.71

2

2.64 -3.89 1.11

.21 -.09 -.04

2

4.16 -2.42 -.60

.70 -.32 .59

2

11.2 -.94 1.61

disp ersion .52 .15

13.9 6.11

Log-L -1062.7 -1009.3 -1001.2 -1023.6 -1004.1 -998.8

Variance of s.e 1.99 1.04 1.02 1.10 1.04 1.02

Forecast

RMSE 4.25 4.25 4.24 4.43 4.29 4.24

SSE 2.55 1.46 1.29 1.51 3.12 1.40

Bias  .03 .03 .02 .08 .04 .03

Variance  .22 .22 .21 .35 .27 .22

Covariance  .75 .75 .77 .57 .69 .75

Table 4: Numb er of price changes larger than $:75 on the IBM sto ck T-statistics are given

in parenthesis.. 28 GDACP1 GDACP2 GDACP

DACP1 DACP2

ACP

Static Poisson 16 80

12 60

8 40

4 20

0 0

70 72 74 76 78 80 82 0 2 4 6 8 10 12 14

Figure 2: Plot and histogram of the monthly numb er of p olio cases

Number of polio cases Squared number of polio cases

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0

-0.1 -0.1

-0.2 -0.2

-0.3 -0.3

Figure 3: Auto correlations of the numb er of p olio cases 30

Figure 4: Auto correlations of the Pearson residuals: no regressors 31

Figure 5: Auto correlations of the squared Pearson residuals: no regressors 32

Figure 6: Auto correlations of the Pearson residuals with time trend and seasonality 33

Figure 7: Auto correlations of the squared Pearson residuals with time trend and seasonality 34 Number of price changes Number of squared price changes

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

-0.2 -0.2 20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

Number of cubed price changes Number of price changes to the 4th power

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

-0.2 -0.2

20 40 60 80 100 120 140 160 180 200 20 40 60 80 100 120 140 160 180 200

Figure 8: Auto correlations of the daily numb er of price changes larger than $ .75 on IBM

40 80

30 60

20 40

10 20

0 0

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Figure 9: Plot and histogram of the daily number of price changes larger than $ .75 on

IBM 35 ACP DACP

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

-0.2 -0.2 50 100 150 200 50 100 150 200

DACP2 GDACP

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

-0.2 -0.2 50 100 150 200 50 100 150 200

GDACP1 GDACP2

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

-0.2 -0.2

50 100 150 200 50 100 150 200

Figure 10: Auto correlations of the Pearson residuals for the IBM price-change data 36 ACP DACP

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

-0.2 -0.2 50 100 150 200 50 100 150 200

DACP2 GDACP

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

-0.2 -0.2 50 100 150 200 50 100 150 200

GDACP1 GDACP2

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

-0.2 -0.2

50 100 150 200 50 100 150 200

Figure 11: Auto correlations of the squared Pearson residuals for the IBM price-change data 37 ACP DACP

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Uniform Quantile Uniform Quantile

0.2 0.2

0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

SZACP SZDACP

DACP2 GDACP

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Uniform Quantile Uniform Quantile

0.2 0.2

0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

SZDACP2 SZGDACP

GDACP1 GDACP2

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Uniform Quantile Uniform Quantile

0.2 0.2

0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

SZGDACP1 SZGDACP2

Figure 12: Plot of the quantiles of the distribution of Z against the uniform quantiles 38 ACP DACP1

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0

-0.1 -0.1

-0.2 -0.2

-0.3 -0.3 20 40 60 80 100 120 140 20 40 60 80 100 120 140

DACP2 GDACP

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0

-0.1 -0.1

-0.2 -0.2

-0.3 -0.3 20 40 60 80 100 120 140 20 40 60 80 100 120 140

GDACP1 GDACP2

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0

-0.1 -0.1

-0.2 -0.2

-0.3 -0.3

20 40 60 80 100 120 140 20 40 60 80 100 120 140

Figure 13: Auto correlations of the Z statistics for the IBM price-change data 39 ACP DACP1

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0

-0.1 -0.1

-0.2 -0.2

-0.3 -0.3 20 40 60 80 100 120 140 20 40 60 80 100 120 140

DACP2 GDACP

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0

-0.1 -0.1

-0.2 -0.2

-0.3 -0.3 20 40 60 80 100 120 140 20 40 60 80 100 120 140

GDACP1 GDACP2

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0

-0.1 -0.1

-0.2 -0.2

-0.3 -0.3

20 40 60 80 100 120 140 20 40 60 80 100 120 140

Figure 14: Auto correlations of the squared Z statistics for the IBM price-change data 40