David Giles Bayesian Econometrics
3. Properties of Bayes Estimators & Tests
(i) Some general results for Bayes Decision Rules. (ii) Bayes estimators - some exact results. (iii) Bayes estimators - some asymptotic (large-n) results. (iv) Bayesian interval estimation.
1
General Results for Bayes Decision Rules
• Recall - Bayes rules and MEL rules (generally equivalent) • Minimum Expected Loss (MEL) Rule: "Act so as to Minimize (posterior) Expected Loss"
∫ 퐿(휽, 휽̂)푝(휽|풚)푑휽 Ω • Bayes' Rule: "Act so as to Minimize Average Risk." (Often called the "Bayes' Risk"): 푟(휽̂) = ∫ 푅 (휽, 휽̂)푝(휽)푑휽 • The action will involve selecting an Estimator, or rejecting some Hypothesis, for instance.
2
• We have the following general results –
(i) A Mini-max Rule always exists.
(ii) Any decision rule that is Admissible must be a Bayes’ Rule with
respect to some prior distribution.
(iii) If the prior distribution is “proper”, every Bayes’ Rule is
Admissible.
(iv) If a Bayes’ Rule has constant risk, then it is Mini-max.
(v) If a Mini-max Rule corresponds to a unique Bayes’ Rule, then
this Mini-max Rule is also unique.
3
Bayes Estimators – Some Exact Results
• As long as the prior p.d.f., 푝(휽) is “proper”, the corresponding Bayes’
estimator is Admissible.
• Bayesians are not really interested in the properties of their estimators in a
“repeated sampling” situation.
• Interested in behaviour that’s conditional on the data in the current sample.
• Contrast “Bayesian Posterior Probability Intervals” (or, “Bayesian Credible
Intervals”) with “Confidence Intervals”.
• However, note that Bayes’ estimators may be biased or unbiased in finite
samples.
4
Bayes Estimators – Some Asymptotic Results
• Intuitively, we’d expect that as the sample size grows, the Likelihood
Function will dominate the Prior.
• In the limit, we might expect that Bayes’ estimators will converge to
MLE’s.
• An exception will be if the prior is “totally dogmatic” (degenerate).
• So, not surprisingly, Bayes estimators are weakly consistent.
• The principal asymptotic result associated with Bayes’ estimators is the so-
called “Bernstein-von Mises Theorem”.
5
Sergei Bernstein Richard von Mises (1880 – 1968) (1883 – 1953)
6
Theorem: Unless 푝(휽) is degenerate, lim {푝(휽|풚)} = 푁[휽̃, 퐼(휽̃)−1] ; where 휽̃ is the MLE 푛→∞ of 휽.
Proof: (for the scalar case) From Bayes’ Theorem – 푝(휃|풚) ∝ 푝(휃)퐿(휃|풚) = 푝(휃)푒푥푝{푙표푔퐿(휃|풚)} • Take a Taylor’s series expansion for 푝(휃) about the MLE, 휃̃ –
1 ퟐ 푝(휃) = 푝(휃̃) + (휃 − 휃̃)푝′(휃̃) + (휃 − 휃̃) 푝′′(휃̃) + ⋯ 2
′ 푝 (휃̃) 1 ퟐ 푝′′(휃̃) = 푝(휃̃)[1 + (휃 − 휃̃) + (휃 − 휃̃) + ⋯ ] 푝(휃̃) 2 푝(휃̃)
7
′ 푝 (휃̃) 1 ퟐ 푝′′(휃̃) ∝ [1 + (휃 − 휃̃) + (휃 − 휃̃) + ⋯ ] 푝(휃̃) 2 푝(휃̃)
• Let 푙(휃) = 푙표푔퐿(휃).
• 푒푥푝{푙표푔퐿(휃|풚)} = 푒푥푝{푙(휃)}
1 ퟐ = 푒푥푝{푙(휃̃) + (휃 − 휃̃)푙′(휃̃) + (휃 − 휃̃) 푙′′(휃̃) + ⋯ } 2
• Note that 푙′(휃̃) = 0. • So,
1 ퟐ 1 ퟑ 푒푥푝{푙(휃)} = 푒푥푝{푙(휃̃)} 푒푥푝 { (휃 − 휃̃) 푙"(휃̃)} 푒푥푝 { (휃 − 휃̃) 푙′′′(휃̃)} … . 2 6
8
Or,
1 ퟐ 1 ퟑ 푒푥푝{푙(휃)} ∝ 푒푥푝 { (휃 − 휃̃) 푙"(휃̃)} 푒푥푝 { (휃 − 휃̃) 푙′′′(휃̃)} …. 2 6 • Expand this exponential:
1 ퟐ 1 ퟑ 푒푥푝{푙(휃)} ∝ 푒푥푝 { (휃 − 휃̃) 푙"(휃̃)} [1 + { (휃 − 휃̃) 푙′′′(휃̃)} + ⋯]. 2 6 • The leading term is
1 ퟐ 푒푥푝{푙(휃)} ∝ e푥푝 { (휃 − 휃̃) 푙"(휃̃)} 2 • This will apply for large n
In this case,
푝(휃|풚) ∝ 푒푥푝{푙(휃)}푝(휃)
9
1 ퟐ 푝′(휃̃) 1 ퟐ 푝′′(휃̃) ∝ 푒푥푝 { (휃 − 휃̃) 푙"(휃̃)} [1 + (휃 − 휃̃) + (휃 − 휃̃) + ⋯ ] 2 푝(휃̃) 2 푝(휃̃)
If n is large enough –
1 ퟐ 푝(휃|풚) ∝ 푒푥푝 { (휃 − 휃̃) 푙"(휃̃)} . 2 This is the kernel of a Normal density, centered at 휃̃, and with a variance of −1/푙"(휃̃). • Asymptotically, our Bayes estimator under a zero-one Loss Function will coincide with the MLE.
10
Bayesian Interval Estimation • Consider the Bayesian counterpart to a Confidence Interval.
• Totally different interpretation – nothing to do with repeated sampling.
• An interval of the form, (휃퐿, 휃푈), such that 푃푟. [(휃퐿 < 휃 < 휃푈)|풚] = 푝% ,
is a 푝% Bayesian Posterior Probability Interval, or a 푝% Bayesian Credible
Interval for 휃.
• Obvious extension to the case where 휽 is a vector: a 푝% Bayesian Posterior
Probability Region, or a 푝% Bayesian Credible Region for 휽.
11
• When we construct a (frequentist) Confidence Interval or Region, we
usually try to make the interval as short (small) as possible for the desired
confidence level.
• We can also consider constructing an “optimal” Bayesian Posterior
Probability Region, as follows:
Given a posterior density function, 푝(휽 | 풚), let A be a subset of the
parameter space such that:
(i) 푃푟. [휽 ∈ 퐴 | 풚] = (1 − 훼)
(ii) For all 휽ퟏ ∈ 퐴 and 휽ퟐ ∉ 퐴, 푝(휽ퟏ | 풚) ≥ 푝(휽ퟐ | 풚) then A is a Highest Posterior Density (HPD) Region of content (1 − 훼) for 휽.
12
• For a given 훼, the HPD has the smallest possible volume.
• If 푝(휽 | 풚) is not uniform over every region of the parameter space, then the
HPD is unique.
• A BPI (BCI) can be used in obvious way to test simple null hypotheses, just
as a C.I. is used for this purpose by non-Bayesians.
• See more of this later in the course.
13
Posterior p.d.f.
[a1 , a2] [b1 , b2] ∪ [b3 , b4] [c1 , c2]
14
Example 1 2 2 푦푖~푁[휇 , 휎0 ] ; 휎0 is known
• Before we see the sample of data, we have prior beliefs about value of 휇: 2 푝(휇) = 푝(휇|휎0 )~푁[휇̅ , 푣̅]
That is,
1 푝(휇) ∝ 푒푥푝 {− (휇 − 휇̅)2} 2푣̅
• Now we take a random sample of data:
풚 = (푦1, 푦2, … … , 푦푛) • The joint data density (i.e., the Likelihood Function) is:
15
푛 1 푝(풚|휇, 휎2) = 퐿(휇|풚, 휎2) ∝ 푒푥푝 {− ∑(푦 − 휇)2} 0 0 2휎2 푖 0 푖=1 • Bayes’ Theorem:
2 2 2 푝(휇|풚, 휎0 ) ∝ 푝(휇|휎0 )푝(풚|휇, 휎0 ) • So, 푛푦̅ 2 휇̅ ( + 2) 1 1 푛 푣̅ 휎0 푝(휇|풚, 휎2) ∝ 푒푥푝 − ( + ) (휇 − ) 0 2 1 푛 2 푣̅ 휎0 ( + ) 푣̅ 휎2 { [ 0 ]}
• The Posterior distribution for μ is 푁[휇̿, 푣̿], where
16
1 푛 (( )휇̅ +( )푦̅) 푣̅ 휎2 휇̿ = 0 1 푛 ( + ) 푣̅ 2 휎0 1 1 푛 = ( + 2) 푣̿ 푣̅ 휎0
• So, a 95% HPD interval for 휇 is obtained as follows –
푃푟. [−1.96 < 푍 < 1.96] = 95%
(휇 − 휇̿) 푃푟. [−1.96 < < 1.96 | 풚] = 95% √푣̿
푃푟. [휇̿ − 1.96√푣̿ < 휇 < 휇̿ + 1.96√푣̿ | 풚] = 95%
17
• The 95% HPD interval for 휇 is the interval
{ 휇̿ − 1.96√푣̿ ; 휇̿ + 1.96√푣̿ }
• The (posterior) probability that 휇 lies in this interval is 95%.
Example 2
• Random sample of n observations from an Exponential distribution , so
푝(푦푖 |휃) = 휃exp (−휃푦푖) ; 휃 > 0
• Prior density for 휃 is chosen to be Gamma (α , β), where 훼, 훽 > 0:
휃 푝(휃) ∝ 휃훼−1exp (− ) 훽
Shape Scale
18
• So, the likelihood function is
푝(풚 |휃) = 휃푛exp (−푛푦̅휃)
• Bayes’ Theorem:
푝(휃 | 풚) ∝ 휃푛+훼−1exp {−(푛푦̅ + 훽−1)휃}
• This is posterior is Gamma, with parameters (푛 + 훼) and 1/ (푛푦̅ + 훽−1).
• Another example of Natural Conjugacy.
• Bayes point estimators of 휃
(i) Quadratic Loss: 휃̂ = (푛 + 훼)/ (푛푦̅ + 훽−1).
(ii) Zero-One Loss: 휃̂ = (푛 + 훼 − 1)/ (푛푦̅ + 훽−1).
19
• Suppose that 훼 = 2 and 훽 = 4 .
• If n = 2, and 푦̅ = 0.125 , what is the posterior probability of the BCI ,
[3.49 , 15.5]?
• In this case the posterior density is Gamma [4 , 2].
• This is the same as a Chi-Square density with 8 degrees of freedom.
• Because Gamma [(v / 2) , 2] = Chi-Square (v)
2 2 • 푃푟. [휒(8) < 3.49] = 0.10, and . [휒(8) < 15.5] = 0.95 .
• So, the posterior probability for this BCI is (0.95 – 0.10) = 0.85.
• What are the Bayes point estimates of 휃 under quadratic and 0-1 losses?
20