Some General Results
Total Page:16
File Type:pdf, Size:1020Kb
Appendix Some General Results A.1 Multinomial Distribution Consider the following multinomial distribution coming from n independent multi- nomial trials with xi the number of counts occurring in the cell with probability pi (i = 1, 2,...,k), namely n! k Pr[{x }] = pxi i k ! i i=1 xi i=1 n k k k = pxi , x = n, p = 1. (A.1) x , x ,...,x i i i 1 2 k i=1 i=1 i=1 Here Pr[{xi }] stands for Pr[{xi }] = Pr[X1 = x1, X2 = x2,...,Xk = xk ], the “singular” (and symmetrical version) of the distribution as we need the identifia- k = bility constraint i=1 pi 1. We can express this distribution briefly as the singular ( , ,..., ) ∼ ( , ) distribution x1 x2 xk Multinomial n p . = − k−1 = We obtain the nonsingular distribution by writing pk 1 i=1 pi and xk − k−1 n i=1 xi . In this respect, the notations for the singular and nonsingular distribu- tions are sometimes confused in the literature, especially the definition n n! = . {x } k ! i i=1 xi The binomial distribution, written Binomial(n, p), has k = 2, and the Bernoulli dis- tribution, written Bernoulli(x, p), is Binomial(1, p) for discrete random x taking the values of 1 and 0. © Springer Nature Switzerland AG 2019 571 G. A. F. Seber and M. R. Schofield, Capture-Recapture: Parameter Estimation for Open Animal Populations, Statistics for Biology and Health, https://doi.org/10.1007/978-3-030-18187-1 572 Appendix: Some General Results A.1.1 Some Properties By adding appropriate cells together we see that the marginal distribution of any subset of a multinomial distribution is also multinomial with the appropriate pi ’s added together. If we finally end up with just two pooled cells we have the binomial distribution. We now show how a nonsingular multinomial distribution can be repre- sented by the product of conditional binomial distributions. To do this we note first that if x1 and x2 have a multinomial distribution and x1 has a binomial distribution, we find that Pr[x2, x1] Pr[x2 | x1]= Pr[x1] − − n − x p x2 p n x1 x2 = 1 2 1 − 2 . x2 1 − p1 1 − p1 Now Pr[x1, x2,...,xk ]=Pr[x1] Pr[x2 | x1] Pr[x3 | x1, x2]···Pr[xk | x1, x2,...,xk−1] and we have shown that for k = 2wehave Pr[x1, x2]=Pr[x1] Pr[x2 | x1] where both distributions are binomial. We can then show by induction that the fac- torization of Pr[x1, x2,...,xk ] gives a product of conditional binomial distributions. We now consider two useful techniques applied by Robson and Youngs (1971) and referred to as “peeling” and “pooling” by Burnham (1991) in using multinomial distributions. A.1.2 Peeling Process The peeling process with multinomial distributions can be described as follows. k = Suppose we wish to peel off the probability distribution of X1.If i=2 xi x and k = − = i=2 pi 1 p1 q1, then Appendix: Some General Results 573 k Pr[X1 = x1, X2 = x2,...,Xk = xk with xi = n] i=1 = Pr[X1 = x1] Pr[X1 = x1,...,Xk = xk | X1 = x1] = Pr[X1 = x1] Pr[X2 = x2,...,Xk = xk with x = n − x1] k xi n − n − x p = px1 qn x1 1 i . x 1 1 x , x ,...,x q 1 2 3 k i=2 1 A.1.3 Pooling Process The pooling process begins with two independent singular multinomial distributions with the same number of cells and the same cell probabilities, namely n k k Pr[{x }] = 1 pxi , x = n i x , x ,...,x i i 1 1 2 k i=1 i=1 and n k k Pr[{y }] = 2 pyi , y = n . i y , y ,...,y i i 2 1 2 k i=1 i=1 If we “add” the two distributions together (the convolution) we get + n1 n2 k n + n + Pr[{x + y }] = 1 2 pxi yi , (x + y ) = n + n . i i x + y ,...,x + y i i i 1 2 1 1 k k i=1 i=1 This can be proved using moment generating functions as the moment generating function of the sum is the product of the two moment generating functions, or simply observe that we have n1 + n2 multinomial trials with the same set of probabilities {pi }. Now because of the independence we have Pr[{xi }] Pr[{yi }] k n n + = 1 2 pxi yi x , x ,...,x y , y ,...,y i 1 2 k 1 1 k i=1 k n + n + = 1 2 xi yi pi x1 + y1, x2 + y2,...,xk + yk i=1 + × n1 n2 / n1 n2 , x1, x2,...,xk y1, y2,...,yk x1 + y1, x2 + y2,...,xk + yk 574 Appendix: Some General Results that is, the product of the two probability distributions is the product of the distribution of their sum times a hypergeometric distribution. A.1.4 Conditional Distribution Suppose we have a nonsingular distribution n! − − [ , ]= x1 x2 ( − − )n x1 x2 . Pr x1 x2 p1 p2 1 p1 p2 x1!x2!(n − x1 − x2)! If y = x1 + x2, then y has probability function n Pr[y]= (p + p )y(1 − p − p )n−y y 1 2 1 2 and Pr[x , y] Pr[x | y]= 1 1 Pr[y] [ , ] = Pr x1 x2 [ ] Pry x1 x2 = y p1 p2 , x1 p1 + p2 p1 + p2 which is a binomial distribution. A.2 Delta Method We consider general ideas only without getting too involved with technical details about limits (see also Agresti (2013: Sect. 16.1). Let X be a random variable with mean μ and variance σ2, and let Y = g(X) be a “well-behaved” function of X that has a Taylor expansion 1 g(X) − g(μ) = (X − μ)g(μ) + (X − μ)2g(X ), 2 0 where X0 lies between X and μ, g (μ) is the derivative of g evaluated at X = μ, and g (X0) is the second derivative of g evaluated at X = X0. Then taking expected values, 1 E[g(X)]≈g(μ) + σ2g(μ). (A.2) 2 Appendix: Some General Results 575 Assuming second order terms can be neglected, we have E[Y ]≈g(μ) and var(Y ) ≈ E[(g(X) − g(μ))2] ≈ E[(X − μ)2][g(μ)]2 = σ2[g(μ)]2. (A.3) For example, if g(X) = log X then, for large μ, σ2 var(log X) ≈ . (A.4) μ2 If X = (X1, X2,...,Xk ) is a vector with mean μ, then for suitable g,wehave the first order Taylor expansion k = ( ) − (μ) ≈ ( − ) (μ), Y g X g Xi μi gi (A.5) i=1 (μ) / = μ where gi is ∂g ∂ Xi evaluated at X . Then [ ]≈ [( ( ) − (μ))2] var Y E ⎡g X g ⎤ k k ≈ ⎣ ( − )( − ) (μ) (μ)⎦ E Xi μi X j μ j gi g j i=1 j=1 k k = [ , ] (μ) (μ). cov Xi X j gi g j (A.6) i=1 j=1 A “quick and dirty” method for a product or ratio of two variables is a follows. If Y = X1/ X2 then taking logs and differentials we get Y = X1 − X2 . Y X1 X2 Squaring and taking expected values gives us X var[X ] var[X ] cov[X , X ] 1 ≈ 2 1 + 2 − 1 2 , var μy 2 2 2 (A.7) X2 μ1 μ2 μ1μ2 where μy ≈ μ1/μ2. For a product X1 X2 we simply replace the minus sign by a plus sign and μy by μ1μ2. Sometimes we wish to derive asymptotic variances and covariances of parameters using Taylor expansions and the delta method. This can be very onerous, but there are a few shortcuts or “rules” that have been brought together by Seber (1967) and Jolly (1965) that we now mention. 576 Appendix: Some General Results A.2.1 Application to the Multinomial Distribution Suppose X has the Multinomial distribution given by (A.1) and X X ···X g(X) = 1 2 r (s ≤ k). Xr+1 Xr+2 ···Xs Then, using (A.5), ( ) − (μ) r − s − g X g ≈ Xi μi − Xi μi . g(μ) μ μ i=1 i i=r+1 i Now squaring the above equation, taking expected values, and using μi = npi and 2 σ = npi (1 − pi ),wehave [ ] ( − ) var Xi = npi 1 pi = 1 − 1 2 2 2 μi n pi npi n and [ , ] cov Xi X j =−npi p j =−1 . 2 μi μ j n pi p j n We now have an expression like (A.6) involving three sets of covariances (except when s = r + 1) so that s s 2 r s − r var[g(X)]≈[g(μ)]2 μ−1 − − + δ − r(s − r) , i n n 2 2 i=1 where δ is 1 when s − r ≥ 2 and zero when s = r + 1. Then (Seber 1982: 8–9), [g(μ)]2 s var[g(X)]≈ p−1 − (s − 2r)2 , (A.8) n i i=1 for all s > r. Another way of describing the above method is, as in (A.7) above, to take loga- rithms and then differentials so that r s log(X) = log Xi − log Xi , i=1 i=r+1 and r s δXi = δXi − δXi . X X X i i=1 i i=r+1 i Appendix: Some General Results 577 We then square both sides and take expected values. This is the approach used by Cormack (1993a, b), for example, for log-linear models. Two multinomial cases of interest in this monograph are, s = 2, r = 1 and s = 4, r = 2.