<<

Last Lecture

. 602 - Statistical Lecture 15 • Can Cramer-Rao bound be used to find the best unbiased for any distribution? If not, in which cases? . • When Cramer-Rao bound is attainable, can Cramer-Rao bound be used for find best unbiased estimator for any τ(θ)? If not, what is the Hyun Min Kang restriction on τ(θ)? • What is another way to find the best unbiased estimator? • Describe two strategies to obtain the best unbiased for March 12th, 2013 τ(θ), using complete sufficient .

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 1 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 2 / 1

Recap - The power of complete sufficient statistics Finding UMUVE - Method 1

. .Use Cramer-Rao bound to find the best unbiased estimator for τ(θ).

1. If ”regularity conditions” are satisfied, then we have a Cramer-Rao . bound for unbiased estimators of τ(θ). Theorem 7.3.23 . • It helps to confirm an estimator is the best unbiased estimator of τ(θ) Let T be a complete sufficient for θ. Let ϕ(T) be any if it happens to attain the CR-bound. estimator based on T. Then ϕ(T) is the unique best unbiased estimator of • If an unbiased estimator of τ(θ) has greater than the its. . CR-bound, it does NOT that it is not the best unbiased estimator. [τ (θ)]2 2. When ”regularity conditions” are not satisfied, ′ is no longer a In(θ) valid lower bound. • There may be unbiased estimators of τ(θ) that have variance smaller 2 than [τ ′(θ)] . In(θ)

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 3 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 4 / 1 Finding UMVUE - Method 2 Frequentists vs. Bayesians

A biased view in favor of Bayesians at http://xkcd.com/1132/ . Use complete sufficient statistic to find the best unbiased estimator for .τ(θ).

1. Find complete sufficient statistic T for θ. 2. Obtain ϕ(T), an unbiased estimator of τ(θ) using either of the following two ways • Guess a function ϕ(T) such that E[ϕ(T)] = τ(θ). • Guess an unbiased estimator h(X) of τ(θ). Construct ϕ(T) = E[h(X) T], then E[ϕ(T)] = E[h(X)] = τ(θ). |

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 5 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 6 / 1

Bayesian Statistic Bayesian Framework

• Prior distribution of θ : θ π(θ). . ∼ Frequentist’s Framework • distribution of X given θ. . X θ f(x θ) | ∼ | = X f (x θ), θ Ω . P { ∼ X | ∈ } . • Joint distribution X and θ Bayesian Statistic . f(x, θ) = π(θ)f(x θ) • Parameter θ is considered as a random quantity | • Distribution of θ can be described by , referred • of X. to as prior distribution m(x) = f(x, θ)dθ = f(x θ)π(θ)dθ • A sample is taken from a population indexed by θ, and the prior | ∫θ Ω ∫θ Ω distribution is updated using from the sample to get ∈ ∈ posterior distribution of θ given the sample. • Posterior distribution of θ (conditional distribution of θ given X) . f(x, θ) f(x θ)π(θ) π(θ x) = = | (Bayes’ Rule) | m(x) m(x)

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 7 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 8 / 1 Example Inference Under Bayesian’s Framework

. Leveraging Prior Information . Burglary (θ) Pr(Alarm|Burglary) = Pr(X = 1 θ) Suppose that we know that the chance of Burglary per household per | 7 True (θ = 1) 0.95 night is 10− . False 0.01 Pr(θ = 1) (θ = 0) Pr(θ = 1 X = 1) = Pr(X = 1 θ = 1) (Bayes’ rule) | | Pr(X = 1) Suppose that Burglary is an unobserved parameter (θ 0, 1 ), and Alarm Pr(θ = 1) ∈ { } is an observed outcome (X = 0, 1 ). = Pr(X = 1 θ = 1) { } | Pr(θ = 1, X = 1) + Pr(θ = 0, X = 1) • Under Frequentist’s Framework, Pr(X = 1 θ = 1) Pr(θ = 1) = | • If there was no burglary, there is 1% of chance of alarm ringing. Pr(X = 1 θ = 1) Pr(θ = 1) + Pr(X = 1 θ = 0) Pr(θ = 0) • If there was a burglary, there is % of chance of alarm ringing. 95 | 7 | • One can come up with an estimator on θ, such as MLE 0.95 10− 6 = 7 × 7 9.5 10− • However, given that alarm already rang, one cannot calculate the . 0.95 10− + 0.01 (1 10− ) ≈ × probability of burglary. × × − So, even if alarm rang, one can conclude that the burglary is unlikely to happen.

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 9 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 10 / 1

What if the prior information is misleading? Advantages and Drawbacks of

. . Over-fitting to Prior Information Advantages over Frequentist’s Framework . . Suppose that, in fact, a thief found a security breach in my place and • Allows making inference on the distribution of θ given . planning to break-in either tonight or tomorrow night for sure (with the • Available information about θ can be utilized. same probability). Then the correct prior Pr(θ = 1) = 0.5. • Uncertainty and information can be quantified probabilistically. Pr(θ = 1 X = 1) . | Pr(X = 1 θ = 1) Pr(θ = 1) = | . Pr(X = 1 θ = 1) Pr(θ = 1) + Pr(X = 1 θ = 0) Pr(θ = 0) Drawbacks of Bayesian Inference | | . 0.95 0.5 • Misleading prior can result in misleading inference. = × 0.99 0.95 0.5 + 0.01 (1 0.5) ≈ • Bayesian inference is often (but not always) prone to be ”subjective” × × − • See : Larry Wasserman ”Frequentist Bayes is Objective” (2006) However, if we relied on the inference based on the incorrect prior, we may Bayesian Analysis 3:451-456. end up concluding that there are > 99.9% chance that this is a false alarm, and ignore it, resulting an exchange of one night of good sleep with • Bayesian inference could be sometimes unnecessarily complicated to interpret, compared to Frequentist’s inference. .quite a bit of fortune. .

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 11 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 12 / 1 Bayes Estimator Solution (1/4)

Prior distribution of p is

. Γ(α + β) α 1 β 1 Definition π(p) = p − (1 p) − . Γ(α)Γ(β) − Bayes Estimator of θ is defined as the posterior mean of θ. distribution of X given p is E(θ x) = θπ(θ x)dθ . | θ Ω | ∫ ∈ n xi 1 xi f (x p) = p (1 p) − . X | − Example Problem i=1 . ∏ { } i.i.d. X , , Xn Bernoulli(p) where 0 p 1. Assume that the prior Joint distribution of X and p is 1 ··· ∼ ≤ ≤ distribution of p is Beta(α, β). Find the posterior distribution of p and the Bayes. estimator of p, assuming α and β are known. fX(x, p) = fX(x p)π(p) n | xi 1 xi Γ(α + β) α 1 β 1 = p (1 p) − p − (1 p) − − Γ(α)Γ(β) − i=1 ∏ { }

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 13 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 14 / 1

Solution (2/4) Solution (3/4)

The marginal distribution of X is

1 n n Γ(α + β) xi+α 1 n xi+β 1 The posterior distribution of θ x : m(x) = f(x, p)dp = p i=1 − (1 p) − i=1 − dp | 0 Γ(α)Γ(β) ∑ − ∑ ∫ 1 ∫ f(x, p) Γ(α + β) Γ( xi + α)Γ(n xi + β) π(θ x) = = − | m(x) 0 Γ(α)Γ(β) Γ(α + β + n) ∫ ∑ ∑ Γ(α + β) x n x xi+α 1 n xi+β 1 Γ( i + α + i + β) xi+α 1 n xi+β 1 p − (1 p) − − − p − (1 p) − − dp Γ(α)Γ(β) ∑ − ∑ ×Γ( xi + α)Γ(n xi + β) ∑ − ∑ = [ ] ∑ n − ∑ n Γ(α + β) Γ( xi + α)Γ(n xi + β) Γ(α + β) Γ( xi + α)Γ(n xi + β) − = ∑ i=1 ∑ − i=1 Γ(α)Γ(β) Γ(α + β + n) Γ(α)Γ(β) Γ(α + β + n) [ ∑ ∑ ] ∑ ∑ Γ(α + β + n) 1 = p xi+α 1(1 p)n xi+β 1 x n x − − − fBeta( xi+α,n xi+β)(p)dp Γ( i + α)Γ( i + β) ∑ − ∑ × 0 − − ∫ ∑ n ∑ n Γ(α + β) Γ( xi + α)Γ(n xi + β) ∑ ∑ = i=1 − i=1 Γ(α)Γ(β) Γ(α + β + n) ∑ ∑

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 15 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 16 / 1 Solution (4/4) Is the Bayes estimator unbiased?

The Bayes estimator of p is

n n i=1 xi + α i=1 xi + α pˆ = n n = n x + α + n x + β α + β + n i=1 +α np + α i=1 i ∑ i=1 i ∑ E = = p n − α + β + n α + β + n ̸ xi n α α + β = ∑i=1 +∑ [ ∑ ] n α + β + n α + β α + β + n α ∑ Unless α+β = p. = [Guess about p from data] weight · 1 + [Guess about p from prior] weight np + α α (α + β)p · 2 Bias = p = − α + β + n − α + β + n As n increase, weight = n = 1 becomes bigger and bigger and 1 α+β+n α+β +1 n As n increases, the bias approaches to zero. approaches to 1. In other words, influence of data is increasing, and the influence of prior knowledge is decreasing.

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 17 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 18 / 1

Sufficient statistic and posterior distribution Conjugate family

. . Posterior conditioning on sufficient statistics Definition 7.2.15 . . If T(X) is a sufficient statistic, then the posterior distribution of θ given X Let denote the class of pdfs or pmfs for f(x θ). A class Π of prior F | is the same to the posterior distribution given T(X). In other words, distributions is a conjugate family of , if the posterior distribution is the F π(θ x) = π(θ T(x)) class. Π for all f , and all priors in Π, and all x . . | | ∈ F ∈ X

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 19 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 20 / 1 Example: Beta-Binomial conjugate Example: Gamma-Poisson conjugate

• X , , Xn λ Poisson(λ) 1 ··· | ∼ • π(λ) Gamma(α, β) Let ∼ • Prior: • X1, , Xn p Binomial(m, p) 1 α 1 λ/β ··· | ∼ π(λ) = λ − e− • π(p) Beta(α, β) Γ(α)βα ∼ where m, α, β is known. The posterior distribution is • n n π(p x) Beta xi + α, mn xi + β λ x | ∼ − i.i.d. e− λ ( i=1 i=1 ) X λ ∑ ∑ | ∼ x! n e λλxi f (x λ) = − X | x ! i i ∏=1

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 21 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 22 / 1

Gamma-Poisson conjugate (cont’d) Gamma-Poisson conjugate (cont’d)

• Joint distribution of X and λ. • Posterior distribution (proportional to the joint distribution) n λ xi e− λ 1 α 1 λ/β f(x λ)π(λ) = λ − e− f(x λ)π(λ) | x ! Γ(α)βα x [i i ] π(λ ) = | ∏=1 | m(x) nλ λ/β xi+α 1 1 1 = e− − λ − nλ λ/β xi+α 1 1 n α = e− − λ − ∑ i=1 xi! Γ(α)β xi+α ∑ x 1 Γ( i + α) n 1 ∑ ∏ + β ( ) • Marginal distribution ∑ 1 1 − So, the posterior distribution is Gamma xi + α, n + β . m(x) = f(x λ)π(λ)dλ | ( ( ) ) ∫ ∑

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 23 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 24 / 1 Example: Normal Bayes Estimators Summary

Let X (θ, σ2) and suppose that the prior distribution of θ is (µ, τ 2). ∼ N N Assuming that σ2, µ2, τ 2 are all known, the posterior distribution of θ also . becomes normal, with mean and variance given by Today . τ 2 σ2 • E[θ x] = x + µ | τ 2 + σ2 σ2 + τ 2 • Bayes Estimator σ2τ 2 Var(θ x) = • Conjugate family | σ2 + τ 2 . . Next Lecture • The normal family is its own conjugate family. . • The Bayes estimator for θ is a linear combination of the prior and • Bayesian Functions sample • Consistency . • As the prior variance τ 2 approaches to infinity, the Bayes estimator tends toward to sample mean • As the prior information becomes more vague, the Bayes estimator tends to give more weight to the sample information

Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 25 / 1 Hyun Min Kang Biostatistics 602 - Lecture 15 March 12th, 2013 26 / 1