Appl. Math. Inf. Sci. Lett. 6, No. 3, 113-121 (2018) 113 Applied Mathematics & Information Sciences Letters An International Journal

http://dx.doi.org/10.18576/amisl/060303

An Advanced Count Data Model with Applications in Genetics

Zahoor Ahmad*, Adil Rahsid and T. R. Jan.

P.G Department of , University of Kashmir, Srinagar, India

Received: 19 Jan. 2018, Revised: 10 Mar.2018, Accepted: 15 Mar.2018. Published online: 1 Sep. 2018.

Abstract:The present paper introduces an advanced count model which is obtained by compounding generalized negative with Kumaraswamy distribution. The proposed model has several properties such as it can be nested to different existing compound distributions on specific parameter setting. Similarity of the proposed model with existing compound distribution has been shown by means of reparameterization. The properties of the new model are discussed and explicit expressions are derived for the factorial moments. Further method of moments and maximum likelihood estimation is used to evaluate the moments. The potentiality of the proposed model has been tested by chi-square goodness of fit test by modeling the real world count data sets from genetics. Keywords: Generalized negative binomial distribution, Kumaraswamy distribution, compound distribution, factorial , count data.

1 Introduction distribution. In several research papers it has been found that compound Distributions are very flexible and can be used efficiently to Statistical distributions are commonly applied to describe model different types of data sets. With this in mind many real world phenomenon. Dueto the usefulness of statistical compound distributions have been constructed. distributions their theory is widely studied and new Sankaran (1970) obtained a compound of Poisson distributions are developed. There has been an increased distribution with that of Lindley distribution, Zamani and interest in developing generalized or generated families of Ismail (2010) constructed a new compound distribution by new continuous distributions by introducing one or more compounding negative binomial with one parameter additional shape parameters. The popularity and the use of Lindley distribution that provides good fit for count data these new families of distributions have attracted the where the probability at zero has a large value. The attention of statisticians, engineers, economists, researchers like Adil Rashid and Jan obtained several demographers and other applied researchers. The reason compound distributions for instance, (2013) a compound of might be that this additional parameter has proved to be zero truncated generalized negative binomial distribution helpful in improving the goodness-of-fit of the proposed with generalized , (2014a) they obtained family of continuous distributions and they have the ability compound of Geeta distribution with generalized beta to fit skewed data better than existing distributions. There distribution and (2014b) compound of Consul distribution exists many generalized family of continuous distributions with generalized beta distribution. Most recently that have received a great deal of attention in recent years. AdilRashid and Jan (2014c) explored a mixture of We discuss below some generalized continuous generalized negative binomial distribution with that of distributions. generalized which contains several From the last few decades researchers are busy to obtain compound distributions as its sub cases and proved that new probability distributions by using different techniques this particular model is better in comparison to others when such as compounding, T-X family, transmutation etc. but it comes to fit observed count data set. compounding of has received In this paper we propose a new count data model by maximum attention which is an innovative and sound compounding generalized negative binomial distribution technique to obtain new generalized probability (NBD) with Kumaraswamy distribution (KSD) and some distributions. The compounding of probability distributions similarities of the proposed model will be shown with some enables us to obtain both discrete as well as continuous *Corresponding author e-mail: [email protected]

© 2018 NSP Natural Sciences Publishing Cor. 114 Z. Ahmed et al.: An advanced count data model … already existing compound distribution.  r  ( 1)1  r    2 Material and Methods E(X )  (3)  r  1    A discrete is X said to have a generalized    negative binomial distribution (GNBD) with parameters Kumaraswamy distribution is a two parameter continuous m, p and if its probability mass function is given by probability distribution that has obtained by Kumaraswamy (1980) but unfortunately this distribution is not very m m  x x mxx f1(x;m, p,)    (1 p) p , x  0,1,2,... popular among statisticians because researchers have not m x  x  analyzed and investigated it systematically in much detail. (1) Kumaraswamy distribution is similar to the beta Where distribution but unlike beta distribution it has a closed form of cumulative distribution function which makes it very 0  p  1, m  0, p  1;  1, simple to deal with. For more detailed properties one can 0  p  1, m  N ,   0. see references Kumaraswamy (1980) and Jones (2009). Usually the parameters m, p,and in GNBD are fixed constants but here we have considered a problem in which For   1, equation (1) reduces to the negative binomial the probability parameter p is itself a random variable distribution (NBD). If m N , f o r  0, one obtains following KSD with p.d.f (2). from (1) the binomial distribution (BD) and for 1 , the Pascal distribution N.Johnson, S.Kotz and A. Kemp 3 Definition of Proposed Model (1992,p.200). A random variable X is said to have a Kumaraswamy If X | p~ GNBD m, p,  where p is itself a random distribution (KSD) if its pdf is given by variable following Kumaraswamy distribution KSD ,   then determining the distribution that results from  1 1  marginalizing over p will be known as a compound of f 2 X ; ,   x 1 x  , 0  x 1 (2) generalized negative binomial distribution with that of

Kumaraswamy distribution which is denoted by GNBKSD Where ,  0are shape parameters The raw moments of m, ,,   It may be noted that proposed model will be Kumaraswamy distribution are given by a discrete since the parent distribution GNBD is discrete. Theorem 3.1: The probability mass function of a 1 compound of GNBD p, ,m with KSD (,) is given E(X r )  x r f (X ;,  )dx  2 by 0

Proof: Using the definition (3), the p.m.f of a compound of GNBD with KSD can be obtained as m, p,  1

fGNBKSD X ;m,, ,   f1 x | p f2 p dp  0

x 1 m m x  x j m jxx1   1 f X ;m,, ,       1 p 1 p dp GNBKSD             m x  x  j0  j 0

substituting 1 p  y , we get

m m x x  x 1 m jxx f X ;m,, ,       1  j y  1 (1 y)  dy GNBKSD       m x  x  j0  j 0

© 2018 NSP Natural Sciences Publishing Cor.

Appl. Math. Inf. Sci. Lett. 6, No. 3, 113-121 (2018) / http://www.naturalspublishing.com/Journals.asp 115

x m m x  x j  m  j x  x  f X ;m,, ,        1 B  , 1 GNBKSD        m x  x  j0  j   

 m  j x  x  x ( 1) 1 m m x  x j    f X ;m,, ,      1 (4) GNBKSD m x  x    j  m  j x  x    j0     1     where . From here a random X variable following a compound of GNBD with KSD f (X ;m,,  ,)  f x | f  ,  ,d will be symbolized by GNBKSD m, ,,   GNBGED  1 3 0

4 ReparameterizationTechniques Proposition 4.1: The probability function of the proposed model gets coincide with the compound of GNBD with There are very few continuous probability distributions in GED obtained through reparameterization. statistics whose support lies between 0 and 1 so in order to ascribe a suitable distribution to a probability parameter p  we have a limited choice, to remove this limitation Proof: If X |~GNBDm, p  e , where  is researchers try to reparameterize the probability parameter itself a random variable following a generalized exponential distribution (GED) with probability density by equating to  where is a random variable. e  0 function So instead of ascribing a suitable probability distribution to parameter p researchers ascribe a suitable distribution to  1 f ( ; ,)  1 e e ,  0 for the parameter  by treating it as a random variable with 3   support  0 and there are numerous probability ,  0 (5) distributions in statistics whose support lies 0,.In this then determining the distribution that results from section a similarity will be shown between a proposed marginalizing over will give us a compound of GNBD model and a model which is obtained by compounding  generalized negative binomial distribution with generalized with GED exponential distribution through reparameterization technique.

m m  x x  x  f (X ;m,,  ,)      ( 1) j e(m jxx) f  ;  , d NBGED      3   m  x  x  j0  j 0  m  j x  x  x   1   1 m m  x  x j    f (X ;m,,  ,)     (1) (6) NBGED m  x  x   j  m  j x  x    j0      1    

Where x 0,1,2,...,m,, ,  0 . Interestingly, this Hence our model can be treated as a simple and easy alternative since it has been obtained without gives rise to the same probability function as the probability reparameterization.function defined in (6) was obtained by function (4) of the proposed model. The probability Adil Rashid and Jan (2014)

© 2018 NSP Natural Sciences Publishing Cor.

116 Z. Ahmed et al.: An advanced count data model …

 j 1   1  1 5 Nested Distributions x  x  f X ; ,     1 j   GKSD   j  j 1  In this particular section we show that the proposed model j0     1  can be nested to different models under specific parameter    setting. Setting 1we get a compound of negative binomial

Proposition 5.1: If X ~ GNBKSD m, , ,   then by distribution with Kummarswamy distribution.

Proof: For  1in (1)GNBD reduces m to negative j binomial x  1  1  x  m 1 distribution x (NjBD) hence a compound of NBD with KSD is f X ;m, ,       1   NBKSD  x   followed j from (4) by simply substitutingm  j 1in it.   j0     1     (7)

For x0,1,2,..., , 0 x (70) , 1,2,...,m, , 0 . The compound distribution displayed in (7) was introduced by Adil and Jan (2014). which is the probability mass function of a compound of GD with KSD. Proposition 5.2: If then by Proposition 5.4: If then by setting  0 and m as positive integer, we obtain a compound of binomial distribution with Kumaraswamy setting    1we obtain a compound of GNBD distribution. distribution with uniform distribution.

Proof: For  0 and m  N in (1), GNBD reduces to Proof: Putting in KSD reduces to uniform (0,1) binomial distribution therefore a compound of binomial distribution therefore a compound GNBD with uniform distribution with Kumaraswamy distribution is followed distribution is followed from (4) by simply putting from (4) by simply putting 0 and m  N in it. in it. m m x x  x 1 j  m  j  x  fGNBUD X ;m,       m x x j0 j m  j x  x 1  x ( 1)  1     m  x j  f X ;m, ,        1   BKSD  x    j  m  j  x    j0      1  For x0,1,2,...,m, 0    For x 0,1,2,..., ,  0,m N which is probability Which is probability mass function of a compound of NBD mass function of a compound of binomial distribution with with uniform distribution. Kumaraswamy distribution. Proposition 5.5: If then by Proposition 5.3: If then by setting 1 and    1we get a compound of negative setting   m 1 we get a compound of geometric binomial distribution with uniform distribution. distribution with Kumaraswamy distribution.

Proof: For   m 1 in (1) GNBD reduces to geometric Proof: For  1in (1), GNBD reduces to negative distribution (GD) hence a compound of GD with KSD is binomial distribution and for     1in (2), KSD followed from (4) by simply substituting   m 1 in it. reduces to uniform (0,1) distribution. Therefore a compound of NBD with uniform distribution is followed from (4) by simply putting     1 in it.  x  m 1 x  x 1 j

f NBUD X ;m        x  j0  j r  j 1

© 2018 NSP Natural Sciences Publishing Cor.

Appl. Math. Inf. Sci. Lett. 6, No. 3, 113-121 (2018) / http://www.naturalspublishing.com/Journals.asp 117

For x 0,1,2,...;m 0 GNBKSD m, , ,  About origin we need to apply the well-known results of probability theory viz Which is probability mass function of a compound of NBD with uniform distribution. (i) Conditional expectation identity Proposition 5.6: If X ~ GNBKSD m, , ,   then by E(X l )  E (X l | P) and setting 0,m N and    1we get a compound p of binomial distribution with uniform distribution. (ii) Conditional identity

Proof: For  0 and m  N in (1), GNBD reduces to V(X)  Ep (Var(X | P)) Varp (E(X | P) binomial distribution and for     1in (2), KSD reduces to uniform (0,1) distribution. Thus a compound ofbinomial distribution with uniform (0,1)distribution is Since X | p~ G N B D m, p,  where p is itself a obtained from (4) by letting  0,m N and random variable following KSD ,   , therefore we have in it. l l j E(X )  E (X | P) m x  x 1 p f X ;m      BUD  x    j r  j 1   j0   l l 1p  For x 0,1,2,...;m N E(X )  mEp    p   

Which is probability mass function of a mixture of l l 1 binomial distribution with uniform distribution. l   j j jl E(X )  m  (1)  p f ( p;,  ) dp  j  2 j0   0 Proposition 5.7: If X ~ GNBKSD m, , ,   then by setting  m 1 and    1we obtain a compound of Using argument (4) we get with uniform distribution.  j  l  l ( 1)1  Proof: For   m 1 in (1), GNBD reduces to geometric l l  j    E(X )  m () distribution and for    1Kumaraswamy distribution j  j  l  j0   1    reduces to U(0,1) distribution. Hence a compound of    geometric distribution with uniform distribution can be obtained from (5) by substituting   m  1and For l 1we get the mean of NBKSD    1in it.   1   j  ( 1)1    x  x 1      f (X )    For x  0,1,2,.. E(X )  r   m1 GUD   j j  2   1   j0    1           Which is the probability function of GD with U(0,1) distribution. Similarly we can find variance of NBKSD using 6 Mean and Variance of the Proposed Model conditional variance identity (ii)

th  m(1p)   r(1p)  In order to obtain the l moment of the proposed model V (X )  E p  2  Varp    p   p 

2  2   1    1   (r  r )(  1)1   r(  1 )1    r(  1)1         V (X )            2   1    1       1     1     1                   

© 2018 NSP Natural Sciences Publishing Cor.

118 Z. Ahmed et al.: An advanced count data model …

7 Factorial Moments of a Compound of k 1  r  k j k    jk 1   1 Negative Binomial Distribution with  k  x  1   p 1  p  dp  r   j   Kumarswamy Distribution j0   0 Substituting1 p  z , we get Theorem 7.1:The factorial moment of order k of a jk compound of negative binomial distribution with k 1   r  k j k  Kumarswamy distribution is given by the expression  x   1   z  1 1  z dz k            r j0  j 0  j  k  k ( 1) 1 k r  k j k      r  k j k   j  k   x  1   k      k  x   1   B, 1 r j0  j   j  k      1  r j0  j        Where x0,1,2,...,r, , 0

Proof: The factorial moment of order k of NBD is

r  k (1 p) k mk X | p r p k (8) since p itself is random variable following KSD, therefore where . For k 1 we get mean one obtains factorial moment of the proposed model by of NBKSD using the definition   1    ( 1)1    x E m X | p  k p k  (X )  r   1  m [1]  1  1   k  1      r  k 1  p        k  x  E p    r  p   

  2   1     11   11     1       [2] (X )  r(r 1)  2   2   2   1   1     2    2         

2 3 E(X )  2 (X )  1 (X )  2  m1  m2 , similarly we can find third moment E(X ) 2

V(X)  m2  m1

2  2   1    1   (r  r )(  1)1   r(  1)1    r(  1)1               V (X )   1    2   1   1      1     1      1                    

Standard deviation of NBKSD r,,  model is Corollary 7.2: The factorial moment of order of a Sd  V (X ) compound of NBD with uniform distribution is given by

© 2018 NSP Natural Sciences Publishing Cor.

Appl. Math. Inf. Sci. Lett. 6, No. 3, 113-121 (2018) / http://www.naturalspublishing.com/Journals.asp 119

Where x0,1,2,...,r, , 0  k       k   j   j    k  x  k1   Proof: Since KSD ,   reduces to Uniform 0,1 j0  j  k  2    distribution for   1, therefore factorial moment of   order k of a compound of NBD with uniform Where distribution can be obtained from (8) by simply substituting x0,1,2,... in it. Proof: Substituting r 1 and    1 in (8) we get the result.  k       k   8 Parameter Estimation r  k j  j   x  1   k  r   j  k  2  j0   In this section the estimation of parameters of     GNBKSD m, , ,   model will be discussed through method of moments and maximum likelihood estimation. Corollary 7.3: The factorial moment of a compound of GD with KSD is given by 8.1 Method of Moments

 j  k  In order estimate four unknown parameters of ( 1) 1 k   model by the method of j k      k  x  k 1   moments we need to equate first four sample moments with   j   j  k  j0     1 their corresponding population moments.   

Where x0,1,2,..., , 0 m1   1 ; m2   2 ;and m3   3 and m4   4

1 n Where  x is the ith sample moment and m is i n  i i Proof: For r  1 NBD reduces to GD and a factorial i1 the ithcorresponding population moment and the solution moment of order k of a compound of geometric distribution with KSD is obtained from (8) by putting for mˆ ,ˆ,ˆ and ˆ may be obtained by solving above in it. equations simultaneously.

 j  k  8.2 Maximum Likelihood Estimation k ( 1) 1 j k      k  x  k 1   j  j  k  The estimation of parameters of j0     1    model via maximum likelihood estimation method requires

the log likelihood function of Corollary 7.4: The factorial moment of a compound of geometric distribution with uniform distribution is given by

n n x  m x     x j  m  j x  x  £(X ; m ,,,  )  log L (X ; m, ,,  )  n log(m )  log i   log(m x )  log   1 B , i i 1      i          i   xi   i1  j0  j   

T The maximum likelihood estimate of   mˆ ,ˆ,ˆ , ˆ can be obtained by differentiating it with

respect unknown parameters m,, and  respectively and then equating them to zero.

© 2018 NSP Natural Sciences Publishing Cor.

120 Z. Ahmed et al.: An advanced count data model …

xi   m xi     xi  j   m  j x  x       (1) B , i i 1  n     n      n m x 1  j0 j m     £(X ;m, ,,  )     i         xi  m m i1  m xi  m x  i1  xi  j  m  j x  x    i   (1) B , i i 1              xi  j0  j         xi   m xi     xi  j   m  j x  x       (1) B , i i 1  n     n       x x  j0 j      £(X ;m, ,,  )    i   i       xi   i1  m xi  m xi  i1  xi  j  m  j x  x        (1) B , i i 1   x    j      i    j0      xi   xi    m  j x  x     (1) j B , i i  1  n       j0 j      £(X ;m, ,,  )     xi   i1  xi   m  j x  x    (1) j B , i i  1         j0  j     

xi   xi  j   m  j x  x     (1) B , i i 1  n      n  j0 j      £(X ;m, ,,  )      x    i  x  m  j  x  x i1  i j  i i     (1) B, 1   j0  j     

These four derivative equations cannot be solved variations it is the study of transmission of body features that is similarities and difference, from ˆ ˆ ˆ ˆ analytically, therefore m,, and  will be parents to offspring’s and the laws related to obtained by maximizing the log likelihood thistransmission, any difference between function numerically using Newton-Raphson individual organisms or groups of organisms of method which is a powerful technique for solving any species, caused either by genetic difference equations iteratively and numerically. or by the effect of environmental factors, is called 9 Application of the Proposed NBKSD Model variation. Variation can be shown in physical appearance, metabolism, behavior, learning and Classical plays an important role in mental ability, and other obvious characters. In modeling count data processes but it requires a strong condition of independent successive events which may not this section the potentiality of proposed model be the characteristic of the count data under consideration will be justified by fitting it to the reported such as a data given in table (1-3). Furthermore the equality genetics count data sets of Catcheside et al. of mean and variance of the observed counts is hardly satisfied in practice, negative binomial distribution can be used in such cases but negative binomial distribution is better for over dispersed count data that are not necessarily 10 Conclusions heavy tailed in such situations compound distribution models serves efficiently well. In this paper we have proposed a new model by compounding GNBD with KSD and it has been shown that 9.1 Application in Genetics proposed model can be nested to different compound distributions. Furthermore, we have derived several Genetics is the science which deals with the properties of proposed model such as factorial moments, mechanisms responsible for similarities and mean, variance. The similarity of the proposed model with some compound distribution has also been discussed. In differences among closely related species. The addition to this parameter estimation of the proposed term ‘genetic’ is derived from the Greek model has been discussed by means of method of moments word‘genesis’ meaning grow into or to become. and MLE. Finally, the application of the proposed model So, genetic is the study of heredity and hereditary havebeen explored in genetics, based on the chi-square

© 2018 NSP Natural Sciences Publishing Cor.

Appl. Math. Inf. Sci. Lett. 6, No. 3, 113-121 (2018) / http://www.naturalspublishing.com/Journals.asp 121

Table 1. Distribution of number of Chromatid aberrations (0.2 g chinon 1, 24 hours)

Fitted Distribution

Number of Observed aberrations Frequency Poisson NBD NBKSD GNBKSD

0 268 231.3 279.1 269.8 265.98

1 87 126.1 71.1 82.5 86.34

2 26 34.7 27.5 27.8 14.36

3 9 6.3 11.8 10.6 7.07  4 4 0.8 5.3 4..5 4.81     5 2 0.1 2.5 2.1  3.2  6 1     0.1 1.1 1.6  2.39     7+ 3 0.1 0.5 0.5  1.81 

400 400 Total 400 mˆ 1.45 ˆ rˆ 1.74 ˆ  0.54 r  0.49 ˆ  6.01 ˆ pˆ  0.48   5.94 Parameter ˆ  3.01 ˆ Estimation   2 ˆ  3.79

0.00 0.09 0.37 0.64 p value

goodness of fit test and it has been shown that [6] Kumaraswamy, p. A generalized probability density functions GNBKSD m, , ,   model offers a better fit as for double-bounded random processes. Journal of hydrology.,46(1),79- compared to classical Poisson distribution and NBKSD. 88,(1980).http//dx.doi.org/10.1016/0022-1694(80)90036-0 We hope that the proposed model will serve as an alternative to various models available in literature. [7] Sankaran, M.The Discrete Poisson-Lindley Distribution. Biometrics.,26, 145-149,(1970). [8] Zamani, H. and Ismail, N. Negative Binomial–Lindley References Distribution And Its Application. J. Mathematics And Statistics.,1, 4–9, (2010). [1] Adil, R and Jan, T.R. A mixture of generalized negative binomial distribution withgeneralizedexponential Distribution.J. Stat. Appl. Pro., 3(3),451-464, (2014c). [2] Adil, R and Jan, T.R. A compound of zero truncated generalized negative binomial distribution with the generalized beta distribution. Journal of Reliability and Statistical studies.,6(1), 11-19,(2013). [3] Adil, R and Jan, T.R. A compound of Geeta distribution with the Generalized Beta Distribution.Journal Of Modern Applied Statistical Methods.,13(1), 278-286, (2014a). [4] Adil, R and Jan, T.R. A compound of Consul distribution with the Generalized Beta Distribution. Journal of Applied probability and Statistics Methods., 9 (2), 17-24,(2014b). [5] Jones, M.cKumaraswamy’s distribution: A beta-type distribution with some tractability advantages. Statistical methodology,6(1),70- 81.(2009).doi.org/10.1016/j.stamet.2008.04.001

© 2018 NSP Natural Sciences Publishing Cor.