Chapter 3
Some Useful Distributions
The two stage trimmed means of Chapter 2 are asymptotically equivalent to D a classical trimmed mean provided that An =MED(n) − k1MAD(n) → a, D Bn =MED(n)+k2MAD(n) → b and if 100F (a−) and 100F (b)arenot integers. This result will also hold if k1 and k2 depend on n. For example take D k1 = k2 = c1 + c2/n. Then MED(n) ± k1MAD(n) → MED(Y ) ± c1MAD(Y ). A trimming rule suggests values for c1 and c2 and depends on the distribution of Y. Sometimes the rule is obtained by transforming the random variable Y into another random variable W (eg transform a lognormal into a normal) and then using the rule for W . These rules may not be as resistant to outliers as rules that do not use a transformation. For example, an observation which does not seem to be an outlier on the log scale may appear as an outlier on the original scale. Several of the trimming rules in this chapter have been tailored so that the probability is high that none of the observations are trimmed when the sample size is moderate. Robust (but perhaps ad hoc) analogs of classical procedures can be obtained by applying the classical procedure to the data that remains after trimming. Relationships between the distribution’s parameters and MED(Y )and MAD(Y ) are emphasized. Note that for location–scale families, highly out- lier resistant estimates for the two parameters can be obtained by replacing MED(Y )byMED(n)andMAD(Y )byMAD(n). Definition 3.1. The moment generating function (mgf) of a random variable Y is m(t)=E(etY )
71 provided that the expectation exists for t in some neighborhood of 0. Definition 3.2. The characteristic function (chf) of a random variable Y is c(t)=E(eitY ) √ where the complex number i = −1.
Definition 3.3. The indicator function IA(x) ≡ I(x ∈ A)=1ifx ∈ A and 0, otherwise. Sometimes an indicator function such as I(0,∞)(y) will be denoted by I(y>0).
3.1 The Binomial Distribution
If Y has a binomial distribution, Y ∼ BIN(k,ρ), then the probability mass function (pmf) of Y is k P (Y = y)= ρy(1 − ρ)k−y y for 0 <ρ<1andy =0, 1,...,k. The moment generating function m(t) = ((1 − ρ)+ρet)k, and the character- istic function c(t) = ((1 − ρ)+ρeit)k. E(Y )=kρ, and VAR(Y )=kρ(1 − ρ). The following normal approximation is often used. Y ≈ N(kρ, kρ(1 − ρ)) when kρ(1 − ρ) > 9. Hence y +0.5 − kρ P (Y ≤ y) ≈ Φ . kρ(1 − ρ) Also 1 1 1 (y − kρ)2 P (Y = y) ≈ √ exp − . kρ(1 − ρ) 2π 2 kρ(1 − ρ) See Johnson, Kotz and Kemp (1992, p. 115). This normal approximation suggests that MED(Y ) ≈ kρ,andMAD(Y ) ≈ 0.6745 kρ(1 − ρ). Hamza (1995) states that |E(Y ) − MED(Y )|≤max(ρ, 1 − ρ) and shows that |E(Y ) − MED(Y )|≤log(2).
72 n ∼ n Following Olive (2008, ch. 9), let W = i=1 Yi bin( i=1 ki,ρ)andlet n ≡ ≤ nw = i=1 ki. Often ki 1andthennw = n. Let P (Fd1,d2 Fd1,d2 (α)) = α where Fd1,d2 has an F distribution with d1 and d2 degrees of freedom. Then the Clopper Pearson “exact” 100 (1 − α)% CI for ρ is 1 0, for W = 0, 1+nw F2nw,2(α) nw , 1 for W = nw, nw + F2,2nw (1 − α) and (ρL,ρU )for0 W ρL = − − W +(nw W +1)F2(nw−W +1),2W (1 α/2) and W +1 ρU = − . W +1+(nw W )F2(nw−W ),2(W +1)(α/2) Suppose Y1, ..., Yn are iid bin(1,ρ). Letρ ˆ = number of “successes”/n and ≤ − ∼ 2 let P (Z z1−α/2)=1 α/2ifZ N(0, 1). Letn ˜ = n + z1−α/2 and 2 nρˆ +0.5z1−α/2 ρ˜ = 2 . n + z1−α/2 Then the large sample 100 (1 − α)% Agresti Coull CI for ρ is ρ˜(1 − ρ˜) p˜ ± z −α/ . 1 2 n˜ Given a random sample of size n, the classical estimate of ρ isρ ˆ =¯yn/k. If each yi is a nonnegative integer between 0 and k, then a trimming rule is keep yi if 4 4 med(n) − 5.2(1 + )mad(n) ≤ yi ≤ med(n)+5.2(1 + )mad(n). n n (This rule can be very bad if the normal approximation is not good.) 73 3.2 The Burr Distribution If Y has a Burr distribution, Y ∼ Burr(φ, λ), then the probability density function (pdf) of Y is 1 φyφ−1 f(y)= φ 1 λ (1 + y ) λ +1 where y,φ, and λ are all positive. The cumulative distribution function (cdf) of Y is − log(1 + yφ) F (y)=1− exp =1− (1 + yφ)−1/λ for y > 0. λ MED(Y )=[eλ log(2) − 1]1/φ. See Patel, Kapadia and Owen (1976, p. 195). Assume that φ is known. Since W =log(1+Y φ)isEXP(λ), MED(W , ..., Wn) λˆ = 1 log(2) is a robust estimator. If all the yi ≥ 0 then a trimming rule is keep yi if 2 0.0 ≤ wi ≤ 9.0(1 + )med(n) n φ where med(n) is applied to w1,...,wn with wi =log(1+yi ). 3.3 The Cauchy Distribution If Y has a Cauchy distribution, Y ∼ C(μ, σ), then the pdf of Y is σ 1 1 f(y)= = 2 − 2 y−μ 2 π σ +(y μ) πσ[1 + ( σ ) ] where y and μ are real numbers and σ>0. 1 y−μ The cdf of Y is F (y)= π [arctan( σ )+π/2]. See Ferguson (1967, p. 102). This family is a location–scale family that is symmetric about μ.Themo- ments of Y do not exist, but the chf of Y is c(t)=exp(itμ −|t|σ). MED(Y )=μ, the upper quartile = μ + σ, and the lower quartile = μ − σ. MAD(Y )=F −1(3/4) − MED(Y )=σ. For a standard normal random vari- able, 99% of the mass is between −2.58 and 2.58 while for a standard Cauchy C(0, 1) random variable 99% of the mass is between −63.66 and 63.66. Hence a rule which gives weight one to almost all of the observations of a Cauchy sample will be more susceptible to outliers than rules which do a large amount of trimming. 74 3.4 The Chi Distribution If Y has a chi distribution, Y ∼ χp, then the pdf of Y is 2 yp−1e−y /2 f(y)= p − 2 2 1Γ(p/2) where y ≥ 0and p is a positive integer. MED(Y ) ≈ p − 2/3. 2 2 See Patel, Kapadia and Owen (1976, p. 38). Since W = Y is χp, a trimming 2 2 rule is keep yi if wi = yi would be kept by the trimming rule for χp. 3.5 The Chi–square Distribution 2 If Y has a chi–square distribution, Y ∼ χp, then the pdf of Y is p − − y y 2 1e 2 f(y)= p p 2 2 Γ( 2 ) where y ≥ 0andp is a positive integer. E(Y )=p. VAR(Y )=2p. Since Y is gamma G(ν = p/2,λ=2), 2rΓ(r + p/2) E(Y r)= ,r>−p/2. Γ(p/2) MED(Y ) ≈ p−2/3. See Pratt (1968, p. 1470) for more terms in the expansion of MED(Y ). Empirically, √ 2p 2 √ MAD(Y ) ≈ (1 − )2 ≈ 0.9536 p. 1.483 9p Note that p ≈ MED(Y )+2/3, and VAR(Y ) ≈ 2MED(Y )+4/3. Let i be an integer such that i ≤ w