Part II
Bayesian Statistics
1 Chapter 6
Bayesian Statistics: Background
In the frequency interpretation of probability, the probability of an event is limiting proportion of times the event occurs in an infinite sequence of independent repetitions of the experiment. This interpretation assumes that an experiment can be repeated!
Problems with this interpretation:
• Independence is defined in terms of probabilities; if probabilities are defined in terms of independent events, this leads to a circular defini- tion.
• How can we check whether experiments were independent, without doing more experiments?
• In practice we have only ever a finite number of experiments.
6.1 Subjective probability
Let P (A) denote your personal probability of an event A; this is a numerical measure of the strength of your degree of belief that A will occur, in the light of available information.
2 Your personal probabilities may be associated with a much wider class of events than those to which the frequency interpretation pertains. For example: - non-repeatable experiments (e.g. that England will win the World Cup next time); - propositions about nature (e.g. that this surgical procedure results in increased life expectancy).
All subjective probabilities are conditional, and may be revised in the light of additional information. Subjective probabilities are assessments in the light of incomplete information; they may even refer to events in the past.
6.2 Axiomatic development
Coherence states that a system of beliefs should avoid internal inconsisten- cies. Basically, a quantitative, coherent belief system must behave as if it was governed by a subjective probability distribution. In particular this assumes that all events of interest can be compared.
Note: different individuals may assign different probabilities to the same event, even if they have identical background information.
See Chapters 2 and 3 in Bernardo and Smith for fuller treatment of foun- dational issues.
6.3 Bayes Theorem and terminology
Bayes Theorem: Let B1,B2,...,Bk be a set of mutually exclusive and exhaustive events. For any event A with P (A) > 0,
P (B ∩ A) P (A|B )P (B ) P (B |A) = i = i i . i P (A) Pk j=1 P (A|Bj)P (Bj) Equivalently we write
P (Bi|A) ∝ P (A|Bi)P (Bi).
3 Terminology: P (Bi) is the called prior probability of Bi; P (A|Bi) is called the likelihood of A given Bi; P (Bi|A) is called the posterior probability of Bi; P (A) is called the predictive probability of A implied by the likelihoods and the prior probabilities.
Example: Two events. Assume we have two events B1,B2, then P (B |A) P (B ) P (A|B ) 1 = 1 × 1 . P (B2|A) P (B2) P (A|B2)
If the data is relatively more probable under B1 than under B2, our belief in B1 compared to B2 is increased, and conversely. c If in addition B2 = B1, then
P (B1|A) P (B1) P (A|B1) c = c × c . P (B1|A) P (B1) P (A|B1) It follows that:
posterior odds = prior odds × likelihood ratio.
6.4 Parametric models
A Bayesian statistical model consists of 1. A parametric statistical model f(x|θ) for the data x, where θ ∈ Θ a parameter; x may be multidimensional.
2. A prior distribution π(θ) on the parameter.
Note: The parameter θ is now treated as random!
The posterior distribution of θ given x is f(x|θ)π(θ) π(θ|x) = R f(x|θ)π(θ)dθ
Shorter, we write: π(θ|x) ∝ f(x|θ)π(θ), or
posterior ∝ prior × likelihood .
4 Example: normal distribution. Suppose that θ has the normal N (µ, σ2)- distribution. Then, just focussing on what depends on θ,
1 1 π(θ) = √ exp − (θ − µ)2 2πσ2 2σ2 1 ∝ exp − (θ − µ)2 2σ2 1 = exp − (θ2 − 2θµ + µ2) 2σ2 1 µ ∝ exp − θ2 + θ . 2σ2 σ2
Conversely, if there are constants a > 0 and b such that
π(θ) ∝ exp −aθ2 + bθ