Supplementary Information on the Negative Binomial Distribution

Supplementary information on the negative binomial distribution November 25, 2013 1 The Poisson distribution Consider a segment of DNA as a collection of a large number, say N, of individual intervals over which a break can take place. Furthermore, suppose the probability of a break in each interval is the same and is very small, say p(N), and breaks occur in the intervals independently of one another. Here, we need p to be a function of N because if there are more locations (i.e. N is increased) we need to make p smaller if we want to have the same model (since there are more opportunities for breaks). Since the number of breaks is a cumulative sum of iid Bernoulli variables, the distribution of breaks, y, is binomially distributed with parameters (N; p(N)), that is N P (y = k) = p(N)k(1 − p(N))N−k: k If we suppose p(N) = λ/N, for some λ, and we let N become arbitrarily large, then we find N P (y = k) = lim p(N)k(1 − p(N))N−k N!1 k N ··· (N − k + 1) = lim (λ/N)k(1 − λ/N)N−k N!1 k! λk N ··· (N − k + 1) = lim (1 − λ/N)N (1 − λ/N)−k N!1 k! N k = λke−λ=k! which is the Poisson distribution. It has a mean and variance of λ. The mean of a discrete random P 2 variable, y, is EY = k kP (Y = k) and the variance is E(Y − EY ) . We usually compute the variance using Var Y = EY 2 − (EY )2. 2 Tha gamma distribution The gamma function is a commonly encountered function in applied math and is defined via Z 1 Γ(α) = xα−1e−x dx: 0 The gamma distribution is a flexible class of distributions on the positive real line. There are a variety of ways to parameterize the gamma distribution: βα f (x) = xα−1e−βx 1 Γ(α) 1 (which has mean α/β) and 1 f (x) = xα−1e−x/β 2 βαΓ(α) (which has mean αβ) are common but if we want to model the mean of a gamma distribution neither of these are very good in that the mean depends on two parameters (unlike the normal or Poisson distribution). We can get around that by altering f1, say, so that it's mean is a single parameter. We do this by defining µ = α/β or just setting β = α/µ, which gives α (α/µ) α−1 − αx f (x) = x e µ : 3 Γ(α) 3 The negative binomial distribution Now use f3 to get an expression for the mass function for a negative binomially distributed random variable. Proceed as follows: Z 1 k α λ −λ (α/µ) α−1 − αλ P (Y = k) = e λ e µ dλ 0 k! Γ(α) α Z 1 (α/µ) 1 α+k−1 −λ( α +1) = λ e µ dλ Γ(α) k! 0 (α/µ)α 1 Γ(α + k) = Γ(α) k! (1 + α/µ)α+k which after a little algebra gives Γ(α + k) α α µ k P (Y = k) = : (1) Γ(α)k! α + µ α + µ 3.1 Moments of the negative binomial distribution: route 1 Now suppose that Y conditional on λ is Poisson with mean λ and λ is distributed according to a gamma distribution parameterized as f3 above. We can use some tricks about conditional expecta- tions to get the mean and variance of y in terms of the parameters µ and α. These are, for 2 random variables X and Y (law of iterated expectation) E[Y ] = EE[Y jX] and (conditional variance formula) Var Y = VarE(Y jX) + EVar(Y jX): Now let X be λ E[Y ] = EE[Y jλ] but conditional on λ, Y is Poisson with mean λ so E[Y jλ] = λ and so E[Y ] = E[λ] but λ has mean µ and so E[Y ] = µ. Next Var Y = VarE(Y jλ) + EVar(Y jλ) but E(Y jλ) = λ and Var(Y jλ) = λ so Var Y = Var[λ] + E[λ]: 2 We compute Eλ2 by doing Z 1 2 2 Eλ = λ f3(λ) dλ 0 Z 1 α 2 (α/µ) α−1 − αλ = λ λ e µ dλ 0 Γ(α) α Z 1 (α/µ) α+1 − αλ = λ e µ dλ Γ(α) 0 (α/µ)α Γ(α + 2) = Γ(α) (α/µ)α+2 µ 2 = α(α + 1): α Now use the variance formula Var Y = EY 2 − (EY )2 to get µ 2 Var λ = α(α + 1) − µ2 α = µ2/α Now use this in the previous expression for Var Y Var Y = Var[λ] + E[λ] = µ2/α + µ = µ(1 + µ/α) So to check this in R > mean(rnbinom(1000,size=2,mu=4)) [1] 4.094 > var(rnbinom(1000,size=2,mu=4)) [1] 12.25368 > 4*(1+4/2) [1] 12 So if you parameterize the gamma distribution as in f3 then what R calls the mean is µ and what the R function that generates deviates from this distribution (and evaluates the mass function, etc.) calls the size is α. The help file calls the size parameter the dispersion parameter, however the authors of edgeR call 1/α the dispersion. I like to use R's nomenclature and I prefer to work with this parameterization because if there is extensive overdispersion then the dispersion parameters will be small and less variable then the edgeR parameterization in which they will be all over the map (which makes shrinkage estimation more problematic). 3.2 Moments of the negative binomial distribution: route 2 You can also bang this all out using the definition of expectation and equation 1. Start with P1 P1 EY = k=0 kP (Y = k) = k=1 kP (Y = k), then substitute in expression 1 to get 1 X Γ(α + k) α α µ k EY = k Γ(α)k! α + µ α + µ k=1 1 µ X Γ(α + k) α α µ k−1 = : α + µ Γ(α)(k − 1)! α + µ α + µ k=1 3 Next we use a property of the gamma function that follows from integration by parts, Γ(x) = (x − 1)Γ(x − 1), to get 1 µ X (α + k − 1)Γ(α + k − 1) α α µ k−1 EY = : α + µ Γ(α)(k − 1)! α + µ α + µ k=1 Now define j = k − 1 and substitute into the equation to get 1 µ X (α + j)Γ(α + j) α α µ j EY = α + µ Γ(α)j! α + µ α + µ j=0 1 µ X Γ(α + j) α α µ j Γ(α + j) α α µ j = j + α α + µ Γ(α)j! α + µ α + µ Γ(α)j! α + µ α + µ j=0 µ = (EY + α) α + µ which is easily solved to give EY = µ. You can then get the variance by using the variance formula. You start by computing EY 2 using similar ideas as for the mean with this additional trick 1 X EY 2 = k2P (Y = k) k=0 1 X = k2P (Y = k) k=1 1 X = [k(k − 1) + k]P (Y = k) k=1 1 1 X X = k(k − 1)P (Y = k) + kP (Y = k) k=2 k=1 1 X = k(k − 1)P (Y = k) + EY: k=2 Now you use the fact that the first 2 factors in the factorial in the denominator of the mass function µ 2 will cancel the k(k − 1) factor in front of P (Y = k), you extract ( α+µ ) and use the trick for the gamma function twice. 1 1 X X Γ(α + k) α α µ k k(k − 1)P (Y = k) = Γ(α)(k − 2)! α + µ α + µ k=2 k=2 1 µ 2 X (α + k − 1)(α + k − 2)Γ(α + k − 2) α α µ k−2 = α + µ Γ(α)(k − 2)! α + µ α + µ k=2 Now set j = k − 2 1 1 X µ 2 X (α + j + 1)(α + j)Γ(α + j) α α µ j k(k − 1)P (Y = k) = α + µ Γ(α)j! α + µ α + µ k=2 j=0 1 µ 2 X = (α + j + 1)(α + j)P (Y = j) α + µ j=0 µ 2h i = α2 + 2αµ + EY 2 + α + µ α + µ 4 Now use this in the expression for EY 2 to get µ 2h i EY 2 = α2 + 2αµ + EY 2 + α + µ + µ, α + µ and so α + 2µ EY 2 = µ2 + µ(α + µ) : (α + µ)2 − µ2 Now use this in the variance formula to get α + 2µ Var Y = µ(α + µ) α2 + 2αµ = µ(1 + µ/α) which agrees with the previous expression. 5.

Supplementary Information on the Negative Binomial Distribution

A Random Variable X with Pdf G(X) = Λα Γ(Α) X ≥ 0 Has Gamma

Stat 5101 Notes: Brand Name Distributions

On a Problem Connected with Beta and Gamma Distributions by R

A Form of Multivariate Gamma Distribution

A Note on the Existence of the Multivariate Gamma Distribution 1

Negative Binomial Regression Models and Estimation Methods

Modeling Overdispersion with the Normalized Tempered Stable Distribution

5 the Poisson Process for X ∈ [0, ∞)

A New Family of Generalized Gamma Distribution and Its Application

Poisson Processes

Handbook on Probability Distributions

Student's T Distribution