The Length-Biased Versus Random Sampling for the Binomial and Poisson Events Makarand V
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Modern Applied Statistical Methods Volume 12 | Issue 1 Article 10 5-1-2013 The Length-Biased Versus Random Sampling for the Binomial and Poisson Events Makarand V. Ratnaparkhi Wright State University, [email protected] Uttara V. Naik-Nimbalkar Pune University, Pune, India Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Ratnaparkhi, Makarand V. and Naik-Nimbalkar, Uttara V. (2013) "The Length-Biased Versus Random Sampling for the Binomial and Poisson Events," Journal of Modern Applied Statistical Methods: Vol. 12 : Iss. 1 , Article 10. DOI: 10.22237/jmasm/1367381340 Available at: http://digitalcommons.wayne.edu/jmasm/vol12/iss1/10 This Regular Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Journal of Modern Applied Statistical Methods Copyright © 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 54-57 1538 – 9472/13/$95.00 The Length-Biased Versus Random Sampling for the Binomial and Poisson Events Makarand V. Ratnaparkhi Uttara V. Naik-Nimbalkar Wright State University, Pune University, Dayton, OH Pune, India The equivalence between the length-biased and the random sampling on a non-negative, discrete random variable is established. The length-biased versions of the binomial and Poisson distributions are discussed. Key words: Length-biased data, weighted distributions, binomial, Poisson, convolutions. Introduction binomial distribution (probabilities), which was The occurrence of so-called length-biased data appropriate at that time. The methodology for has been documented by many researchers. No such discrete data analysis was formalized by formal mathematical definition of length- Rao (1965) who introduced the concept of biasedness exists; however, if for collected data weighted distributions (or in informal the probability of inclusion of an observation in nomenclature, the distorted distributions). Rao a sample is proportional to the magnitude of an also considered the weighted distributions for observation then the data is referred to as the the analysis of various discrete data sets. length-biased data. Further, if the probability of Recently, Gao, et al. (2011) found that the inclusion depends on a certain known function length-biased Poisson (events) data arise in (weight function) of an observation then it is bioinformatics. There are a number of fields referred to as the size-biased data. such as ecology, geology, medical science and The realization of such data as a sample, engineering where the data is length-biased and as opposed to the realization of a well-defined the researchers have used the weighted random sampling procedure, appeared for the distribution for the analysis of such data. first time in a paper by Fisher (1934). In This study follows Ratnaparkhi and particular, the data collected for estimating the Naik-Nimbalkar (2012) regarding the length- proportion of individuals having certain rare biased data arising in oil filed exploration and genetic traits such as albinism are length-biased. seeks to associate length-biased data with a However, in statistical literature, the term random sampling procedure. It is hoped that length-biased data was introduced during the these efforts will lead to new methods for 1960s. Fisher (1934) referred to the data building appropriate models for length-biased collection procedure for counting the number of data and for the estimation of the related albino children in a family as the method of parameters using such data. Since, the general ascertainment, which clearly is not a random discussion of length-biasedness (regardless of sampling procedure. For the analysis of the whether the data is on continuous or discrete collected data he considered the modifications of random variables is more involved, here, the discussion is limited to the discrete random variables. Further, the binomial and Poisson distributions are commonly used for modeling Makarand V. Ratnaparkhi is a Professor of discrete data. Thus, the derivation of results and Statistics. Email him at: the discussion are restricted to these [email protected]. Uttara V. distributions; however, the extensions and Naik-Nimbalkar is a Professor of Statistics. modifications to other discrete distributions are Email her at: [email protected]. possible. 54 RATNAPARKHI & NAIK-NIMBALKAR Length-Biased Data versus Random Sample simple or are the known properties of the Data: Examples distributions of random variables. Therefore, the detailed proofs are omitted for brevity. For Example 1: Estimation of Proportion of Families future reference, the notations and definitions with Albino Children (Fisher, 1934) used herein are listed below. An albino child is located and his/her family is included in a sample. Therefore, larger • X: A random variable representing the the number of albino children in a family higher populations of interest (the original random is the probability of inclusion of the family in variable) X is assumed to be a non-negative, the sample. Let X= the number of albino non-degenerate discrete random variable. children in a family. Clearly, the data on X is • Y, Z1, Z2: Other random variables that will length-biased. The binomial probabilities are be introduced as needed. considered for the analysis if the observed • f (;)xθ : The pdf of the original frequencies of the values of X. X distribution of a random variable X where θ is a scalar or a vector of parameters. Example 2: Estimation of the Population of • θ Moose Using Aerial Transect Sampling gxX (;): The pdf of the length-biased Moose, in general, stay and move as θ version of f X (;)x . groups. The group is located (aerial survey) and • LBS: Length-biased sampling the numbers of animals in a group are recorded by a team on the ground. For locating the group, The pdf of the length-biased version of at least one animal from the group must be θ is given by visible while traveling on a randomly selected f X (;)x transect in the aerial survey of the population. θθ=<∞ The Poisson distribution is considered for the gxXX(;) xfx (;)/[ EXEX ],[ ] . analysis of these data. Note that larger the (1) number of animals in group, the higher is the probability of sighting at least one animal and For a more general case of (1), x is replaced by hence the length-biasedness in the collected wx() > 0, and EX[] by EwX[()].<∞ data. • ()= X Example 3: Length-Biased Data on RNA GtX Et[]:The probability generating Sequences (Gao, et al., 2011) function (pgf) of X . The subscript of Gt() The quantification of transcripts (RNA- denotes the pgf of the corresponding sequences) and related properties, such as the variable. differential expression of the transcripts, arise in • B(,np ): The binomial distribution with microarray data analysis. The differential expression of longer transcripts is more likely to parameters n and p. • be identified than that of a shorter transcript with LBB(,) m p : The length-biased version of the same effect size. This situation is described B(,np ). as length-biasedness in the related data. For the • P()λ : The Poisson distribution with analysis of such data, the Poisson distribution is parameter λ. considered as a model. • LBP()λ : The length-biased version of the Methodology Poisson distribution. The derivations of the results for this study are related to the distributions arising in the analysis Results of length-biased data on discrete random Convolution of the Poisson and Degenerate variables. These are based on certain definitions Random Variables and the probability generating function of the Suppose that the original distribution of random variables. The derivations are either a random variable XP ()λ with pdf 55 LENGTH-BIASED VS. RANDOM SAMPLING FOR BINOMIAL AND POISSON EVENTS f (;)xexλλ= −λ x /!, interpretation must come from the description of X (2) =∞ the research problem form where such data is x 0,1, 2, , . arising in practice. For practical purposes, if it is known then using (1), the pdf of the length biased that while sampling for the original random Poisson distribution is given by variable X the probability of inclusion of an observation is proportional to the magnitude of gx(;)λλ=− e−−λ (1)x /( x 1)!, the observation, a researcher should define the X (3) x =∞1, 2, , . random variables Z1 and Y appropriately and then collect sample data on these variables independently, for example, take a sample first λ =∞ Further, if Z11Pz ( ), 0,1,2, , , then on Y followed by a sample on Z1 and then define Gt( )=−− exp(λ (1 t )). And, if Y has a X = Z1 + Y for practical purposes. It is likely that Z1 such a sampling plan is more involved in terms = degenerate distribution at Y = 1 with GtY () t, of its execution and planning as compared to the =−−λ practice of collecting the data on X and calling it then,Gtt()ZY+ ( ) exp( (1 t )), which is the 1 a length-biased variable without any justification pgf of (3). Thus, the length-biased version of the with reference to the theory of probability. But, Poisson distribution is a convolution of the it provides a theoretical base for the collected Poisson random variable and the degenerate data and a general theoretical justification to the distribution of Y. Further, this result shows that statistical inference based of these data. the length-biased version of X has the same Amari (1985) introduced the role of distribution as that of (Z + Y). 1 differential geometrical properties of statistical This observation, interpreted in terms of manifolds, which are well defined objects, in sampling, implies that the random sampling on estimation theory. Clearly, the manifolds of the the original Poisson variable denoted by, Z , and 1 family of distributions given by (2) and (3) are the degenerate random variable Y is equivalent not the same.