The American Statistician

ISSN: 0003-1305 (Print) 1537-2731 (Online) Journal homepage: http://amstat.tandfonline.com/loi/utas20

Bridging the Gap: A Generalized Process for Count Data

Li Zhu, Kimberly F. Sellers, Darcy Steeg Morris & Galit Shmueli

To cite this article: Li Zhu, Kimberly F. Sellers, Darcy Steeg Morris & Galit Shmueli (2017) Bridging the Gap: A Generalized for Count Data, The American Statistician, 71:1, 71-80, DOI: 10.1080/00031305.2016.1234976 To link to this article: http://dx.doi.org/10.1080/00031305.2016.1234976

Accepted author version posted online: 21 Sep 2016. Published online: 21 Sep 2016.

Submit your article to this journal

Article views: 196

View Crossmark data

Full Terms & Conditions of access and use can be found at http://amstat.tandfonline.com/action/journalInformation?journalCode=utas20

Download by: [Georgetown University] Date: 20 March 2017, At: 07:14 THE AMERICAN STATISTICIAN , VOL. , NO. , – http://dx.doi.org/./..

GENERAL Bridging the Gap: A Generalized Stochastic Process for Count Data

Li Zhua,b, Kimberly F. Sellersb,c, Darcy Steeg Morrisc, and Galit Shmuelid aCAC Fund, Shanghai, China; bDepartment of Mathematics and , Georgetown University, Washington, DC; cCenter for Statistical Research & Methodology, U.S. Census Bureau, Washington, DC; dInstitute of Service Science, National Tsing Hua University, Taiwan

ABSTRACT ARTICLE HISTORY The Bernoulli and Poisson processes are two popular discrete count processes; however, both rely on strict Received August  assumptions. We instead propose a generalized homogenous count process (which we name the Conway– Revised March  Maxwell–Poisson or COM-Poisson process) that not only includes the Bernoulli and Poisson processes as KEYWORDS special cases, but also serves as a flexible mechanism to describe count processes that approximate data ; with over- or under-dispersion. We introduce the process and an associated generalized waiting time dis- Conway-Maxwell-Poisson tribution with several real-data applications to illustrate its flexibility for a variety of data structures. We (COM-Poisson) distribution; consider model estimation under different scenarios of data availability, and assess performance through Count process; Dispersion; simulated and real datasets. This new generalized process will enable analysts to better model count pro- Poisson process; Waiting cesses where data dispersion exists in a more accommodating and flexible manner. time

1. Introduction under-dispersion. With the COM-Poisson process, we develop a Throughout history, stochastic processes have been developed generalized waiting time distribution that encompasses waiting to model data that arise in different disciplines, with the most time distributions associated with the Bernoulli and Poisson notable being transportation, marketing, and finance. With the processes, respectively, and models the distribution of waiting rapid development of modern technology, count data have times for over- or under-dispersed data. Our work develops the become popular in further areas. As a result, stochastic process COM-Poisson process not only to bridge the gap between two models can play a more significant role in today’s data analysis classical count processes, but also to introduce a process that toolkit. can address a wide range of data dispersion. The simplest discrete stochastic process is the Bernoulli pro- The remainder of the article is organized as follows. Section 2 cess, whose associated waiting times are geometric. Meanwhile, provides background and motivation, briefly reviewing the the Poisson process is the most popular and used stochastic Bernoulli and Poisson counting processes, along with their process for count data. Often considered as the continuous- respective frameworks and associated properties. Section 3 for- time counterpart of the Bernoulli process, its most distinguish- mally introduces the COM-Poisson and sum-of-COM-Poisson ing property is its underlying assumption of equi-dispersion (sCOM-Poisson) distributions, and uses them to develop the (i.e., the average number of arrivals equals the variance) in COM-Poisson process and study its properties; included is the a fixed time period. This assumption, however, is constrain- derivation of the associated generalized waiting time distri- ing and problematic because many real-world applications con- bution. Section 4 discusses parameter estimation and associ- tain count data that fail to satisfy the equi-dispersion property. ated uncertainty quantification. Section 5 considers estimation Barndorff-Nielsen and Yeo1969 ( ) and Diggle and Milne (1983) robustness under different simulated data scenarios. Further, considered negative binomial point processes or, more broadly, this section illustrates the flexibility of the COM-Poisson pro- “any flexible class of distributions with a variance-to-mean ratio cess when applied to real-world data, comparing this process greater than unity” (Diggle and Milne 1983, p. 257). While such approach with Bernoulli, Poisson, and other count processes models can account for data over-dispersion (i.e., where the addressing data dispersion. Finally, Section 6 provides a discus- variance is larger than the mean), they cannot effectively model sion and future directions. data under-dispersion (i.e., where the variance is smaller than the mean). 2. Classical Counting Processes In this article, we use the Conway–Maxwell–Poisson (COM- Poisson) distribution to derive what we call a COM-Poisson Çinlar (1975) defined a Bernoulli process with success probabil- process. The significance of the COM-Poisson distribution and, ity p as a discrete-time stochastic process of the form {Xn; n = hence, the corresponding process, lies in its ability to represent 1, 2,...} where, for all n, X1,...,Xn are independent, and Xn a family of processes encompassing count data (including takes the values {0, 1} with P(Xn = 1) = p and the Poisson, geometric, and Bernoulli processes), and fur- P(Xn = 0) = q = 1 − p. The number of successes that have ther models a sequence of arrivals whose data display over- or occurred through the nth trial, Nn = X1 +···+Xn,followsa

CONTACT Kimberly F. Sellers kfs@georgetown.edu Center for Statistical Research, & Methodology, U.S. Census Bureau, Washington, DC . ©  American Statistical Association 72 L. ZHU ET AL.

Binomial(n, p)distribution;N0 = 0. For any m, n ∈ N,thedis- describes the COM-Poisson and sum of COM-Poisson (sCOM- tribution of Nm+n − Nm also follows a Binomial(n, p)distribu- Poisson) distributions, and highlights some of their statisti- tion, independent of m, that is, the process has independent cal properties. Section 3.2 uses these distributions to derive a increments. In terms of waiting times, Tk is defined as the num- homogenous COM-Poisson process. Finally, Section 3.3 intro- ber of trials it takes to get the kth success. The time between duces a generalized waiting time distribution associated with the successes, Tk+1 − Tk for any k ∈ N,followsageometricdistri- COM-Poisson process. m−1 bution, that is, P(Tk+1 − Tk = m) = pq , m = 1, 2,....Fur- ther, for any m and n,theintervalT + − T is independent of m n m 3.1. The COM-Poisson and sCOM-Poisson Distributions m (Çinlar 1975). Meanwhile, a Poisson process is a popular, continuous-time, The COM-Poisson mass function (pmf) takes the stochastic counting process used to model events such as cus- form tomer arrivals, electron emissions, neuron spike activity, etc. λx ( = ) = , = , , ,... (Kannan 1979). Let Nt denote the number of events that have P X x ( )ν (λ, ν) x 0 1 2 occurred up to time t ≥ 0. By definition (Durrett 2004), x! Z r = N0 0 for a X,whereν ≥ 0 is a dispersion parameter r − ∼ (λ ) − = ν = ν>(<) Ns+t Ns Poisson t [in particular, Ns+1 Ns such that 1 denotes equi-dispersion, while 1signi- (λ) ∞ λ j Poisson ] fies under-dispersion (over-dispersion); Z(λ, ν) = = ν is r , ,..., j 0 ( j!) Nt has independent increments, that is, for t0 t1 tn, a normalizing constant (Shmueli et al. 2005)andλ = E(Xν )> − ,..., − the variables N N N N − are independent. r t1 t0 tn tn 1 0. The COM-Poisson distribution includes three well-known the following relations hold: distributions as special cases: the Poisson (ν = 1), geometric (ν = 0andλ<1), and Bernoulli (ν →∞) distributions. The ( − = ) = λ + ( ), P Ns+t Ns 1 t o t expectedvalueandvarianceoftheCOM-Poissondistribution can be presented as derivatives with respect to ln(λ) (Sellers, P(Ns+t − Ns ≥ 2) = o(t), Shmueli, and Borle 2011): ( ) ( ) where a function f t is said to be of order o t if ∂ ln Z(λ, ν) ∂2 ln Z(λ, ν) f (t) ( ) = ( ) = ; → = E X and var X limt 0 t 0(Kannan1979). ∂ ln(λ) ∂(ln λ)2 Thus, a homogenous Poisson process has a rate/intensity (1) parameter λ such that the number of events to occur in the inter- val (t, t + τ] is Poisson (λτ), that is, more generally, the probability and moment generating func- ( ) = ( X ) = Z(λt,ν) ( ) = Z(λet ,ν) tions are GX t E t (λ,ν) and MX t (λ,ν) , −λτ (λτ )i Z Z e respectively. As a weighted Poisson distribution whose weight P (Nt+τ − Nt = i) = , i = 0, 1, 2,..., i! function is w(x) = (x!)1−ν (Kokonendji, Miz, and Balakr- ishnan 2008), the COM-Poisson distribution belongs to both and the associated waiting time between events, Tk, is exponen- the exponential family and the two-parameter power series λ tially distributed with parameter . (Shmueli et al. 2005; Sellers, Shmueli, and Borle 2011). The sum of n iid COM-Poisson variables leads to the sCOM- Poisson (λ, ν, n) distribution, which has the following pmf 3. The Conway–Maxwell–Poisson (COM-Poisson) = n ∼iid Process for a random variable Y i=1 Xi,whereXi COM-Poisson (λ, ν): The Conway–Maxwell–Poisson (COM-Poisson) distribution is y ν a two-parameter generalization of the Poisson distribution λy y P(Y = y) = , introduced by Conway and Maxwell (1962)andwhosestatisti- ( )ν (λ, ν) n ... y! [Z ] ,..., = x x x1 xn 0 1 n +···+ = cal properties were studied by Shmueli et al. (2005); see Sellers, x1 xn y Shmueli, and Borle (2011) for a comprehensive discussion about y = 0, 1, 2,...; n ∈ N, this distribution and its use in various applications and statisti- cal methods. Allowing for both data over- and under-dispersion, y this distribution possesses desirable statistical properties (e.g., it where is a multinomial coefficient. The sCOM- x ... x belongs to the exponential family in both parameters), concep- 1 n Poisson (λ, ν, n) distribution encompasses the special cases of tual appeal (it generalizes the Poisson, geometric, and Bernoulli the Poisson(nλ)distributionwhenν = 1, the negative bino- distributions), and practical use (it can be used to fit a wide range mial NB(n, 1 − λ)distributionwhenν = 0andλ<1, and the of count data). In this article, we use the COM-Poisson distribu- λ Binomial n, distribution as ν →∞.Thespecialcaseof tion as the basis for developing the COM-Poisson process. The λ+1 the sCOM-Poisson (λ, ν, n = 1) distribution is a COM-Poisson latter will serve as a flexible stochastic process that generalizes (λ, ν) distribution. The sCOM-Poisson (λ, ν, n) distribution the Poisson process and allows for over- and under-dispersion in n ( ) = Z(λet ,ν) counts of events because it is a special case of a weighted Poisson has moment generating function, MY t Z(λ,ν) . process as described by Balakrishnan and Kozubowski (2008). TheCOM-PoissonandsCOM-Poissondistributionsareboth In the following, we develop the COM-Poisson process by helpful in our development of a flexible homogenous count pro- first introducing relevant, motivating distributions. Section 3.1 cess to capture data over- or under-dispersion. THE AMERICAN STATISTICIAN: GENERAL 73

3.2. The Homogenous COM-Poisson Process This function has the form of an exponential distribution with parameter ln(Z(λ, ν)); we call this a COM-exponential distri- Let Nt denote the number of events that have occurred up to ∗ = bution. If we consider instead a discretized waiting time Tt time t ∈ N0, where the number of events in a unit interval of Tt (and hence a discrete-time process), where · denotes the time follows the COM-Poisson distribution, that is, ∗ ceiling function, Tt has probability mass function (pmf) λi τ− P(N + − N = i) = , i = 0, 1, 2,..., (2) (λ, ν) − 1 t 1 t ( )ν (λ, ν) ∗ Z 1 1 1 i! Z P(T = τ) = = 1 − , t [Z(λ, ν)]τ Z(λ, ν) Z(λ, ν) where λ>0andν ≥ 0. This homogenous COM-Poisson pro- cess follows a COM-Poisson distribution over the interval τ = 1, 2, 3,.... (6) (t, t + 1] with associated parameters λ and ν.Inotherwords, This distribution is geometric with success parameter 1 − N + − N ∼ COM-Poisson(λ, ν) and has independent incre- t 1 t 1 ; hereafter, we refer to this form as a COM-geometric ments. For the special case when ν = 1, this is the homogenous Z(λ,ν) distribution. Poisson process with parameter λ. By definition, r The relationship between the COM-exponential and COM- N = 0 r 0 geometric waiting times is consistent with the relationship N + − N ∼ sCOM-Poisson(λ, ν, t) r s t s between the exponential and geometric distributions in gen- Nt has independent and stationary increments, that is, , ,..., − eral. The exponential and geometric distributions are known for ordered time points t0 t1 tn, the variables Nt1 ,..., − to share similar properties and be continuous and discrete N N N − are independent. t0 tn tn 1 analogs of each other. In particular, for an exponentially dis- Thus, a homogenous COM-Poisson process has a rate param- tributed random variable X with parameter μ,therandomvari- eter λ and dispersion parameter ν so that the number of events ∗ able X = X has a geometric distribution with parameter p = to occur in the interval (t, t + τ] follows an sCOM-Poisson −μ 1 − e on the set {1, 2, 3,...} (Stephens 2012;p.66);letting (λ,ν,τ)distribution for τ ∈ N, independent of t.Givenitsdis- − (λ,ν) μ = ln Z(λ, ν) for our case, this results in p = 1 − e ln Z = tributional form, the probability of a single event in the interval 1 − 1 .Stephens(2012) further noted that the cdfs corre- (t, t + τ]isgivenby Z(λ,ν) sponding to geometric and exponential distributions are nearly ν λ1 1 1 identical for small values of t,hencethesameistrueforourcase ( − = ) = P Nt+τ Nt 1 ( )ν (λ, ν) τ defined above. 1! [Z ] ,..., = ... τ a1 aτ 0 a1 a2 a +···+ = a1 aτ 1 The waiting times associated with the Poisson and Bernoulli

λτ processes, respectively, are special cases of Equations (5)and(6), = . respectively. We know the Poisson process to be a special case of (λ, ν) τ (3) [Z ] the continuous-time COM-Poisson process with ν = 1, thus For the special case when ν = 1, this reduces to P(N +τ − N = −τ (λ,ν) −λτ t t f (τ ) = ln(Z(λ, ν))e ln Z = λe ,τ ≥ 0, 1) = λτ + o(τ ),asnotedinSection 2. that is, an exponential distribution with parameter λ.Forthe Bernoulli process (i.e., ν →∞), we use Equation (6) recogniz- 3.3. A Generalized Waiting Time Distribution ∗ ing that Tt is discrete to get We have introduced the COM-Poisson distribution to model the ∗ Z(λ, ν) − 1 (1 + λ) − 1 λ number of events that occur in a period of time. We now con- P(T = τ) = = = sider the associated generalized distribution that describes the t [Z(λ, ν)]τ (1 + λ)τ (1 + λ)τ waiting time until the next occurrence. τ− λ 1 1 To find the waiting time distribution, we consider the proba- = ,τ= , , ,.... + λ + λ 1 2 3 bility of no events occurring in a time interval of length τ: 1 1 λ0 P(N +τ − N = 0) = 4. Parameter Estimation t t (0!)ν [Z(λ, ν)]τ 1 In this section, we introduce the modeling procedure and asso- = = P(Tt >τ), ciated variability study as it relates to the COM-Poisson pro- Z[(λ, ν)]τ cess. For illustration ease, we discuss these procedures under thus the context of a discrete waiting time, hence use the COM- 1 geometric waiting time distribution to model the time until the P(T ≤ τ)= 1 − . (4) t [Z(λ, ν)]τ next event. We can similarly pursue these approaches through the continuous-time analog directly, or by refining our time Equation (4) is the cumulative distribution function (cdf) of structure into small enough discrete time increments to approx- the COM-Poisson process waiting time. Given a continuous- imate continuity (Stephens 2012). time process, we differentiate Equation4 ( ) to obtain the density First, consider the special case of a COM-Poisson process function, over a single time unit. The number of events that occur in one −τ −τ (λ,ν) (λ, ν) f (τ ) = [Z(λ, ν)] ln(Z(λ, ν)) = ln(Z(λ, ν))e ln Z , time unit is approximated by a COM-Poisson variable. The following two applications illustrate the modeling proce- τ ≥ 0. (5) dure for such a process. 74 L. ZHU ET AL. r Method 1 (Data on Number of Events in a Single Time distribution is given by Unit): Suppose we have an ordered sequence of count data ln L(λ, ν|t1,...,tn) = n ln(Z(λ, ν) − 1) x1,...,xn, which is well approximated by a COM-Poisson distribution. To identify the respective waiting time dis- n − ( (λ, ν)) , tribution, we first apply the maximum likelihood method ln Z ti to estimate λ and ν by maximizing the COM-Poisson i=1 log-likelihood, and the maximum likelihood value is achieved when n n 1 = − 1. (λ, ν| ,..., ) = ( λ) − ν ( ) 1 (8) ln L x1 xn ln xi ln xi! Z(λ, ν) t i=1 i=1 Equation (8) is an underdetermined system (Soldatov − (λ, ν). n ln Z (7) 2011) because we have one equation and two unknowns (λ and ν). This implies a uniqueness issue where an infi- We can then show that the waiting time follows a geomet- nite number of {λ, ν} pairs satisfy Equation (8)equally ric distribution with parameter pˆ = 1 − 1 ,whereλ,ˆ νˆ Z(λ,ˆ ν)ˆ well. This is a powerful result as one can choose any com- are the respective maximum likelihood estimates of λ, ν. bination of COM-Poisson variables, including Bernoulli Meanwhile, robustness associated with parameter estima- (where ν →∞) and Poisson variables, with appropriate tion is often reflected through the corresponding standard λˆ such that Equation (8)holds.However,supposeinstead errors for the stated estimates. The sampling distributions that, along with an ordered sequence of waiting times associated with λ and ν,however,areknowntopossess t1,...,tn, we have information about the variance-to- skewness (see, e.g., Sellers and Shmueli 2013); hence, we mean ratio of the counts (i.e., the “dispersion index”). This consider two approaches to address uncertainty quantifi- additional information allows us to obtain a unique solu- cation: nonparametric bootstrapping, and using the infor- tion for λ, ν. Our goal remains to identify a COM-Poisson mation matrix to determine standard errors. Using the lat- process over one time unit that best fits the waiting time ter approach, we supply the corresponding information data. Equation (8) allows one to estimate Z(λ, ν) while matrix, Equation (1) implies that the dispersion index equals ⎛ ⎞ 2 2  ∂ ln P(X = x) ∂ ln P(X = x) var(Y ) ∂2 ln Z(λ, ν) ∂ ln Z(λ, ν) ⎜ ⎟ = . (9) ∂λ2 ∂λ∂ν ( ) ∂( λ)2 ∂ (λ) ⎜ ⎟ E Y ln ln I(λ, ν) =−n · E ⎝ ⎠ , ∂2 ln P(X = x) ∂2 ln P(X = x) We can use Equations (8)and(9)toobtaintheestimates, ∂λ∂ν ∂ν2 λˆ and νˆ. Uncertainty quantification can be determined where via nonparametric bootstrapping, again resampling 1000 times using the boot packageinR. ∂2 ( = ) ∂ (λ, ν) 2 r ln P X x =−X + 1 Z Method 3 (Data on Number of Events in an s-Unit Inter- ∂λ2 λ2 [Z(λ, ν)]2 ∂λ val): Suppose we have a random sample of event counts y ,...,y from a sCOM-Poisson(λ, ν, s) distribution with 1 ∂2Z(λ, ν) 1 n − , s ≥ 1. To estimate λ and ν, we maximize the log-likelihood Z(λ, ν) ∂λ2 function of the sCOM-Poisson distribution. The associ- ∂2 ( = ) ∂ (λ, ν) ∂ (λ, ν) ated uncertainty quantification can be determined either ln P X x = 1 Z Z ∂λ∂ν [Z(λ, ν)]2 ∂λ ∂ν via the standard errors from the corresponding informa- tion matrix (which is provided in the Appendix), or using ∂2 (λ, ν) − 1 Z , nonparametric bootstrapping. Z(λ, ν) ∂λ∂ν ∂2 ( = ) ∂ (λ, ν) 2 5. Fitting The COM-Poisson Process to Data ln P X x = 1 Z ∂ν2 [Z(λ, ν)]2 ∂ν 5.1. Data Simulations ∂2 (λ, ν) − 1 Z , Z(λ, ν) ∂ν2 We estimate the COM-Poisson process model from simulated Poisson count data to illustrate the three parameter estimation and E(X) is defined in Equation (1). Meanwhile, to com- procedures described in Section 4. First, we simulate a ran- pute parameter estimates and associated variation via domsampleofsize50fromaPoisson(λ = 2) distribution. nonparametric bootstrapping, we randomly draw 1000 WemodelthecountdataassumingasCOM-Poisson(λ, ν, n = samples with replacement from the data using the boot 2) distribution and obtain the associated parameter estimates. Then, we repeat the simulation 500 times and inspect the corre- r package (Canty and Ripley 2015)inR. Method 2 (Wait-Time Data): For discrete time, the waiting sponding parameter estimates obtained from each simulation. time distribution can be viewed as the pmf of a geometric Based on the theory, we expect λ = 2/n = 2/2 = 1, and ν = 1. = − 1 λˆ νˆ distribution with parameter p 1 Z(λ,ν) .Forparameter The respective distributions for their estimates ( and )from estimation, the log-likelihood function of the waiting time the 500 simulations are provided in Figure 1. We see that the THE AMERICAN STATISTICIAN: GENERAL 75

Table . Coverage probability analysis:  datasets of  data values each are generated from a COM-Poisson(λ = 1.5,ν) distribution with ν ∈{0.5, 1, 2, 10}, and the true value is compared with the respective % nominal coverage confi- dence intervals obtained via the information matrix (i.e., the confidence interval obtained by the maximum likelihood estimate ±1.96SE) and nonparametric boot- strapping (i.e., the percentile-based % confidence interval).

Coverage probabilities Truth Info. Mat. Nonpar. Boot

λ . . . ν . . . λ . . . ν . . . λ . . . ν . . . λ . . . ν . . .

different dispersion levels; because the process samples with replacement, resulting datasets may contain over-, equi-, or under-dispersion, such that the sampling distribution for ν can vary far more widely than that for λ (Sellers and Shmueli 2013). In this example, the percentile-based 95% confidence interval suggests that bootstrapped resamples generally range in their level of under-dispersion from moderate (7.016) to the extreme λ,ˆ νˆ Figure . ( ) pairs and corresponding marginal distributions from  simu- Bernoulli case (with an estimated dispersion, 34.139). The lations of Poisson() data with sample size = . Theoretically, both λ,ˆ νˆ have expected values equal to . associated bias (8.164) indicates, however, that the dispersion level can achieve any potential type of dispersion. marginal distributions are both right-skewed where the respec- Investigating this matter further, Table 2 displays the respec- tive medians of the sampling distributions provide reasonable tive coverage probabilities corresponding to each of the previ- ously considered simulated examples. To determine the respec- approximations to the theoretical parameter values. Meanwhile, jointly, there appears to be a relative linear association between tive coverage probabilities, we generate 500 datasets of 500 data λ ν values from a COM-Poisson(λ = 1.5,ν)distributionwithν ∈ and ;seeFigure 1. { . , , , } We also conduct a simulation study to compare the 0 5 1 2 10 ,andcomparethetruevaluetotherespective95% two approaches for quantifying uncertainty; see Table 1. nominal coverage confidence intervals obtained via the infor- mation matrix (i.e., the confidence interval obtained by the max- We randomly generate 500 data values from a COM- ± . Poisson(λ = 1.5,ν)distributionwithν ∈{0.5, 1, 2, 10}, imum likelihood estimate 1 96SE) and nonparametric boot- andusetheresultingdatatoestimateλˆ and νˆ via maxi- strapping (i.e., the percentile-based 95% confidence interval) mum likelihood estimation. To quantify uncertainty via the approaches; see Table 2 for details. Again, we find that both approaches perform comparably and information matrix approach, we obtain the standard errors ν ν>> associated with the maximum likelihood estimates. Meanwhile, reasonably for close to 1. For 1, however, we see that the nonparametric bootstrap coverage probability is severely for the nonparametric bootstrapping procedure, we consider ν = the percentile-based 95% confidence interval; see Table 1. underestimated (coverage probability for 10 is 0.382). In Both approaches produce similar results for ν values close to this example, the simulated datasets each contain many 0s and 1. For ν>>1, however, we see that the bootstrap method 1s, and relatively few 2s. Accordingly, there are many cases where succumbs to nuances of varying data resamples that display the bootstrap procedure sampling with replacement produces datasets with a binary structure. Under such conditions, the esti- mation procedure recognizes the data as Bernoulli outcomes Table . Uncertainty quantification comparisons for λˆ and νˆ obtained via the infor- and estimates the dispersion parameter as 30 or more. In fact, all mation matrix and nonparametric bootstrapping, respectively. Bootstrap bias asso- θˆ (θ)ˆ = 1 1000 θˆ − θˆ b of the resulting bootstrapped CIs that did not contain the true ciated with estimate is bias 1000 b=1 b ,where is the bootstrap index . True values are λ = 1.5 and ν ∈{0.5, 1, 2, 10}, respectively. value (ν = 10) consistently estimated the dispersion parame- Nonpar. Boot. ter to be at least 30. Thus, while the low coverage probability eludes to potential over-estimation of the dispersion parameter, Truth MLE Info. Mat. SE Bias Percentile-based % CI we are actually seeing an artifact of the bootstrap variation and λ . . . . (.,.) its impact on the estimation procedure. ν . . . . (.,.) λ . . . . (.,.) ν . . . . (.,.) 5.2. Real-Data Examples λ . . . . (.,.) ν . . . . (.,.) λ . . . . (.,.) In this section, we use two sets of real count data to illustrate ν . . . . (.,.) the flexibility and predictive power of the COM-Poisson process and compare its performance to other established count models, 76 L. ZHU ET AL. namely, the Bernoulli and Poisson processes, as well as mod- those resulting estimates are used to determine the waiting time eling via a negative binomial or a condensed Poisson (whose parameter estimate. Some of the resampled datasets produced ( = ) = −λt ( + λt ) ν>ˆ pmf for some random variable X is P X 0 e 1 2 ; 0, hence those estimates are not consistent with the special (λ )2x−1 (λ )2x (λ )2x+1 P(X = x) = e−λt ( t + t + t ), x = 1, 2,...;see case of a geometric (i.e., the COM-Poisson distribution where 2(2x−1)! (2x)! 2(2x+1)! ν = Chatfield and Goodhardt1973 ( ) and Johnson, Kemp, and Kotz 0) distribution. Nonetheless, we see that the resulting (2005)) to address over- or under-dispersed data, respectively. confidence intervals are very similar. For model comparison using Akaike’s information criterion Meanwhile, assume that we are given only the data dis- (AIC), Burnham and Anderson (2002) suggest considering persion index (1.810) and the waiting times associated with ¯ = .  = AIC − AICmin,whereAICmin is the minimum of the Figure 2 (t 3 33). Using Equations (8)and(9)(Method2), i i ˆ different AIC values, thus inferring that the best model has we obtain λ = 0.382 [95% CI = (0.357,0.431)] and νˆ = 0 [95%  = 0 and the other models have >0. Accordingly, analysts CI = (0.000,0.000)], and the associated waiting time distribution can compare models as being a best approximating model via is geometric with parameter 0.382 [95% CI = (0.356,0.431)]. these difference measures in that “models having  ≤ 2have As expected, since νˆ = 0, we obtain a geometric waiting time i ˆ substantial support (evidence), those in which 4 ≤ i ≤ 7 with parameters that equal the respective λ under each method. have considerably less support, and models having i > 10 While we obtain equal estimates for ν under either approach, have essentially no support” (Burnham and Anderson 2002; the estimates for λ vary, due to the lesser information provided pp. 70–71). We will apply this approach for model comparison in Method 2. By only having waiting time data, analysts know accordingly. only that at least one event has occurred by the stated time—not howmanyeventsoccurredbythattime.Knowledgeofthedis- persion level allows us to uniquely determine estimates for λ and Example 1: Fetal lamb movements. ν, yet lack of knowledge persists in that multiple data constructs Figure 2 shows the number of movements by a fetal lamb canproducethesamedispersionlevel. observed by ultrasound and counted in successive 5 sec Given the discussion above, it is interesting that the non- intervals (Guttorp 1995). The data indicate over-dispersion, parametric bootstrap percentile-based 95% confidence intervals with var(Y )/E(Y ) = 1.810. Using the procedure described derived under Method 2 are smaller than those for Method 1. for counts of events (Method 1), we obtain λˆ = 0.277 [95% Under Method 2, we assume the dispersion index is known CI = (0.222,0.337)] and νˆ = 0 [95% CI = (0.000,0.444)], and and thus can only resample the waiting times between events. the associated waiting time distribution is geometric with In actuality, however, bootstrapping the real data would intro- parameter 0.277 [95% CI = (0.222,0.333)], as given by Equation duce variability in both the dispersion index and waiting (6). Note the difference between the confidence intervals for time values, where both components impact the estimation λ ν λ and the waiting time parameter, even though they share the of , . Thus, bootstrapping for Method 2 is performed in same estimated values (which is expected, since νˆ = 0). This a restrictive manner, constraining the amount of quantifiable difference in bounds is due to the nonparametric bootstrapping variation. λˆ νˆ procedure that is performed to determine the percentile- Figure 3 displays the bootstrap distribution of and for based confidence interval for the waiting time parameter. The Methods 1 and 2. We observe comparable results for the respec- procedure resamples the data, from which λ, ν are estimated and tive distributional shape of both parameters under both meth- ods. We see via the histograms for νˆ in Figure 3 (and, more prevalently, in Figure 4) the greater skewness associated with the sampling distribution for νˆ, compared to that for λˆ . Figure 4 demonstrates the joint dependence of λ and ν in maximizing the log-likelihood function described in Equation (7). Table 3 provides the process comparisons (based on AIC) of the COM-Poisson process when compared with Poisson, negative binomial, Bernoulli, and condensed Poisson processes used to fit the fetal lamb data. These results show that the neg- ative binomial process performs best with the COM-Poisson a close second in performance, outperforming the other distribu- tions considered. This confirms the theoretical results shown in Section 3.2 where the negative binomial distribution is a spe- cial case of the sCOM-Poisson distribution for ν = 0. Thus, becauseweobtainthemaximumlikelihoodestimateνˆ = 0for this dataset, it implies that the negative binomial process is opti- mal here. Constraining ourselves to COM-Poisson process con- sideration, we find that the geometric process best models the data structure. Noting that the geometric distribution is a spe- cial case of a negative binomial distribution implies that the negative binomial process will outperform the geometric dis- Figure . The number of movements by a fetal lamb observed by ultrasound and tribution because of its flexibility. The small difference in AIC counted in successive  sec intervals (Guttorp ). THE AMERICAN STATISTICIAN: GENERAL 77

Figure . Histogram representations of sampling distributions for λˆ (first column) and νˆ (second column), respectively, under Methods  (first row) and  (second row). Sampling distributions obtained by conducting nonparametric bootstrapping on the fetal lamb movement dataset.

between the negative binomial and COM-Poisson processes (i = 1.9) nonetheless demonstrates substantial support that the COM-Poisson process is likewise a Kullback–Leibler best model (Burnham and Anderson 2002).

Example 2: Flooding on the Rio Negro River. Figure 5 shows the years of major floods between 1892 and 1992 (inclusive) on the Rio Negro River in Brazil (Brillinger 1995;Guttorp1995); the data are under-dispersed with var(Y )/E(Y ) = 0.800. Using the method described for events data (Method 1), we obtain the maximum likelihood estimates, λˆ = 0.262 [95% CI = (0.148,0.423)] and νˆ = 30.197 [95% CI = (28.78,30.94)], and the associated waiting time dis- tribution is therefore geometric with parameter 0.208 [95% CI = (0.129,0.297)]; see Equation (6). Instead, suppose we are

Table . Model comparison for the fetal lamb movement data; i = AICi − AICmin as described in Burnham and Anderson (). Model Poisson Neg-Bin Bernoulli Con. Poisson COM-Poisson

Figure . Contour plot of log-likelihood values (where the log-likelihood is defined AIC . . . . . λ ν in Equation ()) for various values of and when modeling the fetal lamb move-  . . . . . ment dataset as a COM-Poisson(λ, ν) process. i 78 L. ZHU ET AL.

implying a geometric wait time distribution with parameter 0.206 [95% CI = (0.157,0.364)]. While the estimates for ν dif- fer considerably, the resulting estimated wait time distributions under either approach are very similar. The above results demonstrate the relative robustness of λˆ while νˆ shows broad variation in its sampling distribution. The bootstrap distributions of λˆ and νˆ in Figure 6 show a skewed distribution for λˆ under either method, yet the range of possible values for λˆ remains small in both cases. Meanwhile, the sam- pling distribution for νˆ differs vastly depending on the method of bootstrapping. Bootstrapping in the usual way for Method 1producesaskeweddistributionforνˆ consistent with values expected from a Bernoulli process, as seen empirically in Sell- ers and Shmueli (2010)andSellers(2012). Conducting a boot- strapping procedure in accordance with Method 2, however, dis- plays a bimodal (almost U-shaped) distribution where values for νˆ can either range between [0, 10](withmuchofthefrequency falling in [0,5]) or [25, 45] (with much of the frequency lying in [35,40]). Figure 7 displays the contour plot of the log-likelihood Figure . ThenumberoffloodsontheRioNegroRiver. function as provided in Equation (7)associatedwiththisdataset; inthisfigure,weseethelargerangeofpossibleνˆ values versus only given the data dispersion index (0.800), and the interflood thesmallrangeinvaluesforλˆ such that {λ,ˆ νˆ} pairs produce a ¯ durations (t = 4.85). Using Equations (8)and(9)(Method2), maximum log-likelihood value. ˆ we obtain the maximum likelihood estimates, λ = 0.259 [95% Table 4 compares the COM-Poisson process with Poisson, CI = (0.187,0.513)] and νˆ = 6.221 [95% CI = (2.22,41.04)], negative binomial, Bernoulli, and condensed Poisson processes

Figure . Histogram representations of sampling distributions for λˆ (first column) and νˆ (second column), respectively, under Methods  (first row) and  (second row). Sampling distributions obtained by conducting nonparametric bootstrapping on the Rio Negro flood dataset. THE AMERICAN STATISTICIAN: GENERAL 79

over-, or under-dispersion. The significance of the COM- Poisson distribution and, hence, the corresponding process lies in its ability to represent a family of processes encompassing count data, including the Poisson, geometric, and Bernoulli processes. Thus, the COM-Poisson process not only serves as a flexible model for count data expressing a wide range of dis- persion, but it can also aid in model selection through insights regarding the estimated dispersion parameter, for cases where the estimated dispersion parameter takes a value associated with one of the special case distributions. With this flexible stochastic process is a likewise flexible waiting time distribution, which canbeusedtorepresentthediscreteorcontinuouswaitingtime associated with the count process. As demonstrated through the examples, the COM-Poisson process and COM-geometric dis- tribution can be used to model and forecast rare events and small counts, without concern regarding the existence and type of data dispersion. We developed various estimation approaches based on the potentially different data structures made available to the Figure . Contour plot of log-likelihood (as described in Equation ()) values for variousvaluesofλ and ν when modeling the Rio Negro flood dataset as a COM- analyst: Methods 1 and 3, under the more common scenario Poisson(λ, ν) process. wherecountdataareprovidedoveragiventimeframe;and Method 2, which considers the less likely scenario of only being used to fit the flood data. These results show that the condensed provided with waiting time data and dispersion information. Poisson process performs best, with the COM-Poisson pro- With regard to Method 2, we exclude the final waiting time, cess ranking second in performance (based on AIC). The small which is censored by the end of the data collection period. differenceinAICbetweenthecondensedPoissonandCOM- Excluding this censored waiting time is conceptually more Poisson processes (i = 1.0) demonstrate substantial support appealing because its inclusion would imply that a “final event” that the COM-Poisson process is likewise a Kullback–Leibler had occurred (which could be incorrect). As we see from the best model (Burnham and Anderson 2002). examples considered, count occurrences are measured over rel- ativetimeintervals.Becausewedonotknowthepointatwhich counting occurs, we can only presume the recording process to Example 3: Fetal Lamb Movements Over 15 sec Intervals. start at some random point within the interevent time interval. We restructure the fetal lamb movement data in Figure 2 This approach is consistent with assumptions made by Chatfield to count the number of fetal lamb movements over suc- and Goodhardt (1973) for the condensed Poisson model. Future cessive 15 sec intervals. According to Method 3, we obtain research can consider how to instead incorporate this censored the maximum likelihood estimates λˆ = 0.277 [95% CI = information into the waiting-time approach, and the associ- (0.205,0.350)] and νˆ = 0 [95% CI = (0.000,0.123)], and infer a ated impact of this alternative. Additional future directions geometric waiting time distribution with parameter 0.276 [95% include considering penalized maximum likelihood estimation CI = (0.205,0.350)]. for parameter estimation for various goals such as predicting When we increase the level of temporal aggregation for the future waiting times. Nonetheless, as illustrated in Section 5, COM-Poisson process, we lose precision relative to the dis- all of the methods have shown consistent results, demonstrat- aggregated counts. Yet, we notice that the maximum likeli- ing the robustness of the proposed process framework. The hood methods used in Examples 1 and 3 yield similar param- COM-Poisson process thus serves as a viable, flexible method eter estimates. This result illustrates the robustness of the for modeling count data where significant data dispersion is a COM-Poisson modeling procedures to the choice of temporal possibility. aggregation. Accompanying R functions were developed for this project and are available upon request. Future work regarding the statis- 6. Discussion tical computing aspects of this work includes the development of an R package containing these codes so that interested analysts This work develops the COM-Poisson process to serve as a mayaccessandusethesefunctionsonCRANintheirapplied flexible stochastic process for count data containing equi-, work.

Table . Model comparison for Rio Negro River flood data, where i = AICi − AICmin as described in Burnham and Anderson (). Appendix: Information Matrix for Method 3 Model Poisson Neg-Bin Bernoulli Con. Poisson COM-Poisson The information matrix for Method 3 is described as follows. AIC . . . . . Given a random sample of event counts y1,...,yn from a sCOM- i . . . . . Poisson(λ, ν, s) distribution, the corresponding information matrix 80 L. ZHU ET AL. associated with the parameters, λ and ν,is Kimberly Sellers is provided through the ASA/NSF/Census Fellowship Pro- ⎛ ⎞ gram (U. S. Census Bureau Contract YA1323-14-SE-0122). Galit Shmueli ∂2 ln P(Y = y) ∂2 ln P(Y = y) was supported in part by grant 104-2410-H-007-001-MY2 from the Min- istry of Science and Technology in Taiwan. ⎜ ∂λ2 ∂λ∂ν ⎟ (λ, ν) =− · ⎜ ⎟ , IM3 n E ⎝ ⎠ ∂2 ln P(Y = y) ∂2 ln P(Y = y) ∂λ∂ν ∂ν2 References where Balakrishnan, N., and Kozubowski, T. J. (2008), “A Class of Weighted Pois- son Processes,” Statistics and Probability Letters, 78, 2346–2352. [72] ∂2 ln P(Y = y) Y s ∂Z(λ, ν) 2 Barndorff-Nielsen, O., and Yeo,G. F.(1969), “Negative Binomial Processes,” =− + Journal of Applied Probability, 6, 633–647. [71] ∂λ2 λ2 (λ, ν) 2 ∂λ [Z ] Brillinger, D. R. (1995), “Trend Analysis: Binary-Valued and Point Cases,” ∂2 (λ, ν) Stochastic Hydrology and Hydraulics, 9, 207–213. [77] − s Z , Z(λ, ν) ∂λ2 Burnham, K. P., and Anderson, D. R. (2002), Model Selection and Multi- model Inference: A Practical Information-Theoretic Approach (2nd ed.), ∂2 ln P(Y = y) s ∂Z(λ, ν) ∂Z(λ, ν) New York: Springer. [76] = Canty, A., and Ripley, B. (2015), boot: Bootstrap Functions, Version 1.3- ∂λ∂ν [Z(λ, ν)]2 ∂λ ∂ν 15. Available at http://cran.r-project.org/web/packages/boot/index.html ∂2 (λ, ν) [74] − s Z , (λ, ν) ∂λ∂ν Chatfield, C., and Goodhardt, G. J. (1973), “A Consumer Purchasing Model Z with Erlang Inter-Purchase Time,” Journal of the American Statistical ∂2 ( = ) ∂ (λ, ν) 2 Association, 68, 828–835. [76,79] ln P Y y = s Z Çinlar, E. (1975), Introduction to Stochastic Processes,NewYork:Prentice- ∂ν2 [Z(λ, ν)]2 ∂ν Hall,Inc.[71] Conway, R. W., and Maxwell, W. L. (1962), “A Queuing Model with State ∂2 (λ, ν) − s Z Dependent Service Rates,” Journal of Industrial Engineering, 12, 132– Z(λ, ν) ∂ν2 136. [72] Diggle, P. J., and Milne, R. K. (1983), “Binomial Quadrat Counts and Point ∂ (ν) 2 ∂2 (ν) − 1 C + 1 C , Processes,” Scandinavian Journal of Statistics, 10, 257–267. [71] [C(ν)]2 ∂ν C(ν) ∂ν2 Durrett, R. (2004), Essentials of Stochastic Processes,NewYork:Springer. [72] where Guttorp, P. (1995), Stochastic Modeling of Scientific Data,BocaRaton,FL: Chapman & Hall/CRC. [77]

y ν Johnson, N. L., Kemp, A. W., and Kotz, S. (2005), Univariate Discrete Dis- y tributions (3rd ed.), New York: Wiley. [76] C(ν) = , Kannan, D. (1979), An Introduction to Stochastic Processes,NewYork:Else- ,..., = ... x1 xs 0 x1 xs +···+ = x1 xs y vier North Holland. [72] ⎡  ⎤ Kokonendji, C. C., Mizère, D., and Balakrishnan, N. (2008), “Connec- y k ν ∂kC(ν) y y tions of the Poisson Weight Function to Overdispersion and Under- = ⎣ ln ⎦ , dispersion,” JournalofStatisticalPlanningandInference, 138, 1287– ∂νk ,..., = ...... x1 xs 0 x1 xs x1 xs 1296. [72] +···+ = x1 xs y Sellers, K. F. (2012), “A Generalized Statistical Control Chart for Over- k = 1, 2, 3,..., or Under-Dispersed Data,” Quality and Interna- tional, 28, 59–65. [78] ( ) Sellers, K. F., Shmueli, G., and Borle, S. (2011), “The COM-Poisson Model and E Y equals s times the COM-Poisson expectation defined in for Count Data: A Survey of Methods and Applications,” Applied Equation (1). Stochastic Models in Business and Industry, 28, 104–116. [72] Sellers, K. F., and Shmueli, G. (2010), “A Flexible Regression Model for Count Data,” Annals of Applied Statistics, 4, 943–961. [78] Acknowledgments ———— (2013), “Data Dispersion: Now You See it … Now You Don’t,” Communications in Statistics—Theory and Methods, 42, 3134– This article is released to inform interested parties of research and to 3147. [74,75] encourage discussion. The views expressed are those of the authors and not Shmueli,G.,Minka,T.P.,Kadane,J.B.,Borle,S.,andBoatwright,P. necessarilythoseoftheU.S.CensusBureau.TheauthorsthankRalphSnei- (2005), “A Useful Distribution for Fitting Discrete Data: Revival of the der (Monash University) for insightful discussion. Conway-Maxwell-Poisson Distribution,” Applied Statistics—Journal of the Royal Statistical Society, Series C, 54, 127–142. [72] Soldatov, A. P. (2011), Underdetermined system, Encyclopedia of Math- Funding ematics Available at http://www.encyclopediaofmath.org/index.php? title=Underdetermined_system&oldid=16734.[74] Support for Li Zhu was provided by the Georgetown Undergraduate Stephens, K. S. (2012), Reliability Data Analysis with Excel and Minitab, Research Opportunities Program (GUROP). Partial funding support for Milwaukee,WI:ASQQualityPress.[73]