Ancillary Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ......

. Biostatistics 602 - Statistical Inference Lecture 05 1. What is an ancillary for θ? Complete Statistics 2. Can an ancillary statistic be a sufficient statistic? . 3. What are the and the scale parameter? 4. In which case ancillary statistics would be helpful? Hyun Min Kang

January 24th, 2013

Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 1 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 2 / 26

Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Last Lecture : Ancillary Statistics Example : Uniform Ancillary Statistics

. Definition 6.2.16 . . Problem A statistic S X is an ancillary statistic if its distribution does not depend . ( ) i.i.d. on . • X1, , Xn Uniform(θ, θ + 1). . θ ··· ∼ . • Show that R = X(n) X(1) is an ancillary statistic. Examples of Ancillary Statistics . − . i.i.d. 2 2 . • X1, , Xn (µ, σ ) where σ is known. Possible Strategies ··· ∼ N . • 2 1 n 2 sX = n 1 i=1(X1 X) is an ancillary statistic • Method 1 : Obtain the distribution of R and show that it is − − • X X (0, 2σ2) is ancillary. independent of θ. 1 − 2 ∼∑ N • (X + X )/2 X (0, 1.5σ2) is ancillary. • Method 2 : Represent R as a function of ancillary statistics, which is 1 2 − 3 ∼ N (n 1)s2 independent of θ. • − X 2 σ2 χn 1 is ancillary. . . ∼ −

Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 3 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 4 / 26 Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Method 2 - A Simpler Proof Location-Scale Family and Parameters

. f (x θ) = I(θ < x < θ + 1) = I(0 < x θ < 1) Definition 3.5.5 X | − . Let f(x) be any pdf. Then for any µ, < µ < , and any σ > 0 the −∞ ∞ dx family of pdfs f((x µ)/σ)/σ, indexed by the parameter (µ, σ) is called Let Yi = Xi θ Uniform(0, 1). Then Xi = Yi + θ, = 1 holds. − − ∼ | dy | the location-scale family with standard pdf f(x), and µ is called the location. parameter and σ is called the scale parameter for the family. dx fY(y) = I(0 < y + θ θ < 1) = I(0 < y < 1) . − | dy | Example . Then, the range statistic R can be rewritten as follows. 2 • f(x) = 1 e x /2 (0, 1) √2π − ∼ N 1 (x µ)2/2σ2 2 • f((x µ)/σ)/σ = e− − (µ, σ ) R = X X = (Y + θ) (Y + θ) = Y Y − √2πσ2 ∼ N (n) − (1) (n) − (1) (n) − (1) .

As Y Y is a function of Y , , Yn. Any joint distribution of (n) − (1) 1 ··· Y1, , Yn does not depend on θ. Therefore, R is an ancillary statistic. ···Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 5 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 6 / 26

Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Complete Statistics Notes on Complete Statistics . Definition . • Let = f (t θ), θ Ω be a family of pdfs or pmfs for a statistic T { T | ∈ } • Notice that completeness is a property of a family of probability T(X). distributions, not of a particular distribution. • The family of probability distributions is called complete if • For example, X (0, 1) and g(x) = x makes E[g(X)] = EX = 0, • E[g(T) θ] = 0 for all θ implies Pr[g(T) = 0 θ] = 1 for all θ. ∼ N | | but Pr(g(X) = 0) = 0 instead of 1. • In other words, g(T) = 0 almost surely. • The above example is only for a particular distribution, not a family • Equivalently, T(X) is called a complete statistic of distributions. . • If X (θ, 1), < θ < , then no function of X except for . ∼ N −∞ ∞ Example g(X) = 0 satisfies E[g(X) θ] for all θ. . | • T(X) (0, 1) • Therefore, the family of (θ, 1) distributions, < θ < , is ∼ N N −∞ ∞ • g (T(X)) = 0 = Pr[g (T(X)) = 0] = 1. complete. 1 ⇒ 1 • g (T(X)) = I(T(X) = 0) = Pr[g (T(X)) = 0] = 1 Pr[T(X) = 0)]. 2 ⇒ 2 − In this case, g (T(X)) = 0 is almost surely true. . 2 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 7 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 8 / 26 Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Why ”Complete” Statistics? Example - Poisson distribution

. Stigler (1972) Am. Stat. 26(2):28-9 . . Problem • While a student encountering completeness for the first time is very . t λ likely to appreciate its usefulness, he is just as likely to be puzzled by • Suppose = f : f (t λ) = λ e− for t 0, 1, 2, . Let T T T | t! ∈ { ···} its name, and wonder what connection (if any) there is between the λ Ω = 1, 2{. Show that this family} is NOT complete statistical use of the term ”complete”, and the dictionary definition: . ∈ { } lacking none of the parts, whole, entire. . Proof strategy • Requiring g(T) to satisfy the definition puts a restriction on g. The . larger the family of pdfs/pmfs, the greater the restriction on g. When • We need to find a counter example, the family of pdfs/pmfs is augmented to the point that E[g(T)] = 0 • which is a function g such that E[g(T) λ] = 0 for λ = 1, 2 but | for all θ, it rules out all g except for the trivial g(T) = 0, then the g(T) = 0. ̸ family is said to be complete. . .

Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 9 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 10 / 26

Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Poisson distribution example : Proof Poisson distribution example : Proof (cont’d)

The function g must satisfy Define g(t) as

∞ λte λ E[g(T) λ] = g(t) − = 0 | t! 2 t = 0 t = 2 t=0 ∨ ∑ g(t) = 3 t = 1 for λ 1, 2 . Thus,  − ∈ { }  0 otherwise Then 1te 1  E[g(T) λ = 1] = ∞ g(t) − = 0 | t=0 t!  ∑ t 2  2 e− ∞  E[g(T) λ = 2] = t∞=0 g(t) t! = 0 g(t)/t! = g(0)/0! + g(1)/1! + g(2)/2! = 2 3 + 2/2 = 0 | − The above equation can be rewritten as t=0  ∑ ∑ ∞ 2tg(t)/t! = g(0)/0! + 2g(1)/1! + 22g(2)/2! = 2 6 + 8/2 = 0 − t∞=0 g(t)/t! = 0 t=0 ∑  ∑ t There exists a non-zero function g that satisfies E[g(T)λ] = 0 for all ∞ 2 g(t)/t! = 0  t=0 λ Ω. Therefore this family is NOT complete. ∈ Hyun Min Kang  ∑ Biostatistics 602 - Lecture 05 January 24th, 2013 11 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 12 / 26 Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Another example with Poisson distribution Proof : Finding the moment-generating function of X

. λ x Problem sX ∞ sx e− λ . MX(s) = E[e ] = e i.i.d. x! • X , , Xn Poisson(λ), λ > 0. x=0 1 ··· ∼ ∑ n s x • Show that T(X) = i=1 Xi is a complete statistic. ∞ λ (e λ) esλ esλ . = e− e− e x! ∑ x=0 . ∑ s x esλ Proof strategy ∞ λ esλ (e λ) e− . = e− e • x! Need to find the distribution of T(X) x=0 ∑ • Show that there is no non-zero function g such that E[g(T) λ] = 0 for s ∞ | = eλee λ f (x esλ) all λ. Poisson | . x=0 ∑ λ(es 1) = e −

Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 13 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 14 / 26

Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... n Proof : Finding the MGF of T(X) = Xi Proof : Showing E[g(T) λ] = 0 Pr[g(T) = 0] = 1 i=1 | ⇐⇒ ∑ . Suppose that there exists a g(T) such that E[g(T) λ] = 0 for all λ > 0. n | nλ t s Xi sXi ∞ e (nλ) MT(s) = E(e ) = E e E[g(T) λ] = − ∑ ( ) | t! i=1 t=0 n ∏ ∑ n t E esXi E esXi nλ ∞ g(t)(nλ) = = = e− = 0 t! i=1 t=0 ∏ ( )n [ ( )] λ(es 1) nλ(es 1) ∑ = e− − = e − Which is equivalent to [ ] . Theorem 2.3.11 (b) ∞ g(t)nt . λt = 0 Let FX(x) and FY(y) be two cdfs all of whose moments exists. If the t! t=0 moment generating functions exists and MX(t) = MY(t) for all t in some ∑ for all λ > 0. Because the function above is a power series expansion of λ, neighborhood of 0, then FX(u) = FY(u) for all u. . t n g(t)n /t! = 0 for all t. and g(t) = 0 for all t. Therefore T(X) = i=1 Xi is By Theorem 2.3.11, T(X) Poisson(nλ). a complete statistic. ∼ Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 15 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th,∑ 2013 16 / 26 Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Example : Uniform Distribution Proof : Uniform Distribution (cont’d)

. Consider a function g(T) such that E[g(T) θ] = 0 for all θ > 0 Problem | . i.i.d. θ n n 1 Let X1, , Xn Uniform(0, θ), θ > 0, Ω = (0, ). E[g(T) θ] = g(t)nθ− t − I(0 < t < θ)dt ··· ∼ ∞ | Show that X(n) is complete. 0 . ∫ θ n n 1 . = n g(t)t − dt = 0 Proof θ 0 . ∫ We need to obtain the distribution of T(X) = X(n). Let Taking derivative of both sides, 1 x fX(x) = θ I(0 < x < θ), then its cdf is FX(x) = θ I(0 < x < θ) + I(x θ). ≥ 2 θ n n 1 n n 1 g(θ)θ − g(t)t − dt = 0 n! n 1 θn − θn+1 fT(t θ) = fX(t)FX(t) − ∫0 | (n 1)! θ − ng(θ) n n n 1 n n 1 = g(t)t − dt = E[g(T) θ] = 0 n t − n n 1 θ θ θn θ | = I(0 < t < θ) = nθ− t − I(0 < t < θ) 0 θ θ ∫ . ( ) Because g(T) = 0 holds for all θ > 0, T(X) = X(n) is a complete statistic.

Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 17 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 18 / 26

Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... A simpler proof (how it was solved in the class) Another Example of Uniform Distribution

Consider a function g(T) such that E[g(T) θ] = 0 for all θ > 0 . | Problem θ . n n 1 i.i.d. E[g(T) θ] = g(t)nθ− t − I(0 < t < θ)dt • Let X1, , Xn Uniform(θ, θ + 1), θ R. | 0 ··· ∼ ∈ ∫ θ • We have previously shown that T(X) = (X(1), X(n)) is a minimal n n 1 = g(t)t − dt = 0 sufficient statistic for θ. θn 0 • θ ∫ Show that T(X) is not a complete statistic. n 1 . g(t)t − dt = 0 0 . ∫ Proof - Using a statistic Taking derivative of both sides, . Define R = X(n) X(1). We have previously shown that n 1 − g(θ)θ − = 0 (n 2) f (r θ) = n(n 1)r − (1 r) , 0 < r < 1 g(θ) = 0 R | − − n 1 Then R Beta(n 1, 2), and E[R θ] = − . for all θ > 0. Because g(T) = 0 holds for all θ > 0, T(X) = X(n) is a . ∼ − | n+1 complete statistic. Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 19 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 20 / 26 Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Proof Even Simpler Proof

n 1 Define g(T(X)) = X X − (n) − (1) − n+1

n 1 • We know that R = X n X is an ancillary statistic, which do not E[g(T) θ] = E[X X θ] − ( ) − (1) | (n) − (1)| − n + 1 depend on θ. n 1 n 1 • Define g(T) = X X E(R). Note that E(R) is constant to θ. = − − = 0 (n) − (1) − n + 1 − n + 1 • Then E[g(T) θ] = E(R) E(R) = 0, so T is not a complete statistic. | − Therefore, there exist a g(T) such that Pr[g(T) θ] < 1 for all θ, so | T(X) = (X(1), X(n)) is not a complete statistic.

Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 21 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 22 / 26

Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Example from Stigler (1972) Am. Stat. Solution (cont’d)

. Problem . Let X is a uniform random sample from 1, , θ where θ Ω = N. Is { ··· } ∈ T(X) = X a complete statistic? for all θ N, which implies . ∈ • if , θ g x g . θ = 1 x=1 ( ) = (1) = 0 Solution • θ . if θ = 2, ∑x=1 g(x) = g(1) + g(2) = g(2) = 0. Consider a function g(T) such that E[g(T) θ] = 0 for all θ . N . Note that f x 1 I x 1 I| x . ∈ • . ∑ X( ) = θ ( 1, , θ ) = θ Nθ ( ) ∈ { ··· } θ • if θ = k, x=1 g(x) = g(1) + + g(k 1) + g(2) = g(k) = 0. θ θ ··· − 1 1 Therefore, g(x) = 0 for all x N, and T(X) = X is a complete statistic for E[g(T) θ] = E[g(X) θ] = g(x) = g(x) = 0 ∑ ∈ | | θ θ θ Ω = . x=1 x=1 N ∑ ∑ ∈ θ g(x) = 0 x=1 . ∑

Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 23 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 24 / 26 Ancillary Statistics Complete Statistics Summary Ancillary Statistics Complete Statistics Summary ...... Is the previous example barely complete? Summary . Modified Problem . Let X is a uniform random sample from 1, , θ where { ··· } . θ Ω = N n . Is T(X) = X a complete statistic? Today - Complete Statistics . ∈ − { } . . • Examples of complete statistics Solution . • Two Poisson distribution examples Define a nonzero g(x) as follows • Two Uniform distribution examples x n • Example of barely complete statistics. 1 = . g(x) = 1 x = n + 1  − .  0 otherwise Next Lecture θ . 1 0 θ = n • Basu’s Theorem E[g(T) θ] = g(x) = ̸ . | θ 1 θ = n x=1 θ ∑ { Because Ω does not include n, g(x) = 0 for all θ Ω = N n , and ∈ − { } T(X) = X is not a complete statistic. . Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 25 / 26 Hyun Min Kang Biostatistics 602 - Lecture 05 January 24th, 2013 26 / 26