Recap Bayes Consistency Summary Recap Bayes Risk Consistency Summary ...... Last Lecture

. 602 - Lecture 16 • What is a Bayes ? Evaluation of • Is a Bayes Estimator the best unbiased estimator? . • Compared to other , what are advantages of Bayes Estimator? Hyun Min Kang • What is conjugate family? • What are the conjugate families of Binomial, Poisson, and ? March 14th, 2013

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 1 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 2 / 28

Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Recap - Bayes Estimator Recap - Example

• θ : • π(θ) : prior distribution i.i.d. • X1, , Xn Bernoulli(p) • X θ fX(x θ) : distribution ··· ∼ | ∼ | • π(p) Beta(α, β) • Posterior distribution of θ x ∼ | • α Prior guess : pˆ = α+β . Joint fX(x θ)π(θ) π(θ x) = = | • Posterior distribution : π(p x) Beta( xi + α, n xi + β) | Marginal m(x) | ∼ − • Bayes estimator ∑ ∑ m(x) = f(x θ)π(θ)dθ (Bayes’ rule) | α + x x n α α + β ∫ pˆ = i = i + α + β + n n α + β + n α + β α + β + n • Bayes Estimator of θ is ∑ ∑

E(θ x) = θπ(θ x)dθ | θ Ω | ∫ ∈

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 3 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 4 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Optimality Loss Function

Let L(θ, θˆ) be a function of θ and θˆ. • Squared error loss

L(θ,ˆ θ) = (θˆ θ)2 The squared error (MSE) is defined as − MSE = Average Loss = E[L(θ, θˆ)] MSE(θˆ) = E[θˆ θ]2 − which is the expectation of the loss if θˆ is used to estimate θ. ˆ Let θ is an estimator. • Absolute error loss • If θˆ = θ, it makes a correct decision and loss is 0 L ˆ ˆ • If θˆ = θ, it makes a mistake and loss is not 0. (θ) = θ θ ̸ | − |

• A loss that penalties overestimation more than underestimation

L(θ, θˆ) = (θˆ θ)2I(θˆ < θ) + 10(θˆ θ)2I(θˆ θ) − − ≥

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 5 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 6 / 28

Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Risk Function - Average Loss Alternative definition of Bayes Risk

R(θ, θˆ) = E[L(θ, θˆ(X)) θ] | R(θ, θˆ)π(θ)dθ = E[L(θ, θˆ(X))]π(θ)dθ If L(θ, θˆ) = (θˆ θ)2, R(θ, θˆ) is MSE. ∫Ω ∫Ω − An estimator with smaller R(θ, θˆ) is preferred. = f(x θ)L(θ, θˆ(x))dx π(θ)dθ | . ∫Ω [∫ ] Definition : Bayes Risk X . = f(x θ)L(θ, θˆ(x))π(θ)dx dθ Bayes risk is defined as the average risk across all values of θ given prior Ω | ∫ [∫X ] π(θ) = π(θ x)m(x)L(θ, θˆ(x))dx dθ R(θ, θˆ)π(θ)dθ Ω | ∫ [∫X ] . ∫Ω ˆ The Bayes rule with respect to a prior π is the optimal estimator with = L(θ, θ(X))π(θ x)dθ m(x)dx Ω | respect to a Bayes risk, which is defined as the one that minimize the ∫X [∫ ] Bayes risk.

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 7 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 8 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Posterior Expected Loss Bayes Estimator based on squared error loss

L(θ,ˆ θ) = (θˆ θ)2 − Posterior expected loss = (θ θˆ)2π(θ x)dθ Posterior expected loss is defined as − | ∫Ω = E[(θ θˆ)2 X = x] π(θ x)L(θ, θˆ(x))dθ − | Ω | So, the goal is to minimize E[(θ θˆ)2 X = x] ∫ − | An alternative definition of Bayes rule estimator is the estimator that E (θ θˆ)2 X = x = E (θ E(θ x) + E(θ x) θˆ)2 X = x minimizes the posterior expected loss. − | − | | − | [ ] [ ] = E (θ E(θ x))2 X = x + E (E(θ x) θˆ)2 X = x − | | | − | [ 2 ] = E [(θ E(θ x))2 X = x] + E(θ x) θˆ − | | | − [ ] which is minimized when θˆ =[ E(θ x). ] | Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 9 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 10 / 28

Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Summary so far Bayes Estimator based on absolute error loss

Loss function L(θ, θˆ) ˆ ˆ • e.g. (θˆ θ)2, θˆ θ Suppose that L(θ, θ) = θ θ . The posterior expected loss is − | − | | − | Risk function R(θ, θˆ) is average of L(θ, θˆ) across all x ∈ X E[L(θ, θˆ(x))] = θ θˆ(x) π(θ x)dθ • For squared error loss, risk function is the same to MSE. | − | | ∫Ω ˆ Bayes risk Average risk across all θ, based on the prior of θ. = E[ θ θ X = x] |ˆ − || • Alternatively, average posterior error loss across all θ = (θ θˆ)π(θ x)dθ + ∞(θ θˆ)π(θ x)dθ x . − − | θˆ − | ∈ X ∫−∞ ∫ Bayes estimator θˆ = E[θ x]. Based on squared error loss, | ∂ E[L(θ, θˆ(x))] = 0, and θˆ is posterior . • Minimize Bayes risk ∂θˆ • Minimize Posterior Expected Loss

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 11 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 12 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Two Bayes Rules Example

i.i.d. • X , , Xn Bernoulli(p). 1 ··· ∼ Consider a problem for real-valued parameter θ. • π(p) Beta(α, β) . ∼ • The posterior distribution follows Beta( xi + α, n xi + β). For squared error loss, the posterior expected loss is − • (θ θˆ)2π(θ x)dθ = E[(θ θˆ)2 X = x] Bayes estimator that minimizes posterior∑ expected squared∑ error loss − | − | is the posterior mean ∫Ω xi + α ˆ pˆ = This is minimized by θ = E(θ x). So the Bayes rule α + β + n | ∑ estimator. is the mean of the posterior distribution. . • Bayes estimator that minimizes posterior expected absolute error loss For absolute error loss, the posterior expected loss is E( θ θˆ X = x). As is the posterior median ˆ | − || shown. previously, this is minimized by choosing θ as the median of π(θ x). | θˆ n Γ(α + β + ) xi+α 1 n xi+β 1 1 p − (1 p) − − dp = Γ( xi + α)Γ(n xi + β) − 2 ∫0 − ∑ ∑ ∑ ∑ Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 13 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 14 / 28

Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Asymptotic Evaluation of Point Estimators Tools for proving consistency

When the size n approaches infinity, the behaviors of an estimator are unknown as its asymptotic properties. . Definition - Consistency • Use definition (complicated) . Let Wn = Wn(X , , Xn) = Wn(X) be a sequence of estimators for • Chebychev’s Inequality 1 ··· . We say W is consistent for estimating if W P under τ(θ) n τ(θ) n τ(θ) 2 2 → Pr( Wn τ(θ) ϵ) = Pr((Wn τ(θ)) ϵ ) P. θ for every θ Ω. | − | ≥ − ≥ ∈ E W 2 P [ n τ(θ)] Wn τ(θ) (converges in to τ(θ)) that, given any −2 → ≤ ϵ ϵ > 0. 2 MSE(Wn) Bias (Wn) + Var(Wn) = 2 = 2 lim Pr( Wn τ(θ) ϵ) = 0 ϵ ϵ n →∞ | − | ≥ lim Pr( Wn τ(θ) < ϵ) = 1 Need to show that both Bias(Wn) and Var(Wn) converges to zero n →∞ | − | When Wn τ(θ) < ϵ can also be represented that Wn is close to τ(θ). | − | Consistency implies that the probability of Wn close to τ(θ) approaches to 1 as n goes to . Hyun Min Kang∞ Biostatistics 602 - Lecture 16 March 14th, 2013 15 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 16 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Theorem for consistency Weak Law of Large Numbers

. Theorem 10.1.3 . . Theorem 5.5.2 If Wn is a sequence of estimators of τ(θ) satisfying . Let X1, , Xn be iid random variables with E(X) = µ and • limn > Bias(Wn) = 0. ··· 2 Var(X) = σ < . Then Xn converges in probability to µ. − ∞ ∞ • limn > Var(Wn) = 0. P − ∞ .i.e. Xn µ. → .for all θ, then Wn is consistent for τ(θ)

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 17 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 18 / 28

Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Consistent sequence of estimators Example

. Theorem 10.1.5 . . Let Wn is a consistent sequence of estimators of τ(θ). Let an, bn be Problem sequences of constants satisfying . X1, , Xn are iid samples from a distribution with mean µ and 1. limn an = 1 2 ··· →∞ σ < . 2. limn bn = 0. ∞ →∞ 1. Show that Xn is consistent for µ. Then Un = anWn + bn is also a consistent sequence of estimators of τ(θ). 1 n 2 2 . 2. Show that (Xi X) is consistent for σ . n i=1 − 1 n 2 2 . 3. Show that (Xi X) is consistent for σ . Continuous Map Theorem . n ∑1 i=1 − . − If Wn is consistent for θ and g is a , then g(Wn) is ∑ .consistent for g(θ).

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 19 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 20 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Example - Solution Solution - consistency for σ2

2 2 2 (Xi X) (X + X 2XiX) − = i − . n n ∑ ∑ Proof: Xn is consistent for µ 2 2 n X + nX 2X Xi . = i − i=1 By law of large numbers, Xn is consistent for µ. n ∑ 2 ∑ • Bias(Xn) = E(Xn) µ = µ µ = 0. Xi 2 − − = X n X n − • i=1 i 1 n 2 Var(Xn) = Var n = n2 i=1 Var(Xi) = σ /n. ∑ ∑ By law of large numbers, ( ) σ2 • limn Var(X) = limn n =∑ 0. 1 2 P 2 2 2 →∞ →∞ Xi EX = µ + σ .By Theorem 10.1.3. X is consistent for µ. n → 2 ∑ Note that X is a function of X. Define g(x) = x2, which is a continuous 2 function. Then X = g(X) is consistent for µ2. Therefore, 2 2 (Xi Xn) X 2 P − = i X (µ2 + σ2) µ2 = σ2 n n − → − Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 21 / 28 Hyun∑ Min Kang ∑ Biostatistics 602 - Lecture 16 March 14th, 2013 22 / 28

Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Solution - consistency for σ2 (cont’d) Example -

2 2 From the preious slide, (Xi Xn) /n is consistent for σ . . 2 1 2− 2 1 2 Define Sn = n 1 (Xi Xn) , and (Sn∗ ) = n (Xi Xn) . Problem − ∑− − . i.i.d. ∑ ∑ Suppose X1, , Xn Exponential(β). 2 1 2 2 n S = (Xi Xn) = (S∗ ) ··· ∼ n n 1 − n · n 1 1. Propose a of the median. − ∑ − 2 2 2. Propose a consistent estimator of Pr(X c) where c is constant. Because (Sn∗ ) was shown to be consistent for σ previously, and . ≤ n 2 2 an = n 1 1 as n , by Theorem 10.1.5, Sn is also consistent for σ . − → → ∞

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 23 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 24 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Consistent estimator for the median Consistent estimator of Pr(X c) ≤

First, we need to express the median in terms of the parameter β.

m 1 x/β 1 e− dx = β 2 c 0 1 x/β ∫ m Pr(X c) = e− dx x/β 1 ≤ β e− = ∫0 − 0 2 c/β = 1 e− m /β 1 − 1 e− = − 2 As X is consistent for β, 1 e c/β is continuous function of β. median = m = β log 2 − − By continuous mapping Theorem, g(X) = 1 e c/X is consistent for − − Pr(X c) = 1 e c/β = g(β) By law of large numbers, Xn is consistent for EX = β. ≤ − − Applying continuous mapping Theorem to g(x) = x log 2, g(X) = Xn log 2 is consistent for g(β) = β log 2 (median).

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 25 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 26 / 28

Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary ...... Consistent estimator of Pr(X c) - Alternative Method Summary ≤

. Today . i.i.d. • Bayes Risk Functions Define Yi = I(Xi c). Then Yi Bernoulli(p) where p = Pr(X c). ≤ ∼ ≤ • Consistency • n n Law of Large Numbers 1 1 . Y = Yi = I(Xi c) n n ≤ . i=1 i=1 Next Lecture ∑ ∑ . is consistent for p by Law of Large Numbers. • • Slutsky Theorem • .

Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 27 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 28 / 28