Bayesian Inference and Decision Theory
Unit 5: The Normal Model
©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 1 - Learning Objectives for Unit 5
• Describe the conjugate prior distribution for the normal distribution with: • Unknown mean and known variance • Unknown mean and unknown variance • Extend techniques from previous units to infer the posterior distribution for the mean, and the variance if unknown, of a normal distribution from a sample of observations • Use Monte Carlo sampling to estimate posterior quantities from a normal distribution • Explain why methods for normally distributed data are often used with non- normal data • State conditions under which this is justified • State conditions under which this practice can lead to misleading results • Analyze accuracy of point estimators: Mean squared error, bias and variance
©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 2 - The Normal (Gaussian) Distribution
• Most studied and most applied distribution in statistics • Probability density function is the familiar bell-shaped curve
1 1 � − � � � �, � = exp − 2�� 2 � Note: exp{x} means ex
• Symmetric around � Standard Normal Probability Density Function (q=0, s=1) • 95% of probability lies within about 2� of � 0.45 0.4 • Linear combinations of normally distributed random variables are normally distributed 0.35 0.3 • Central Limit Theorem: 0.25
If �̅ is the sample mean of � iid observations 0.2
from a distribution with mean � and variance Density Probability 0.15 ̅ �2, then as � grows large, converges to 0.1 / a normal random variable with mean 0 and 0.05 0 variance 1 -4 -3 -2 -1 0 1 2 3 4 x ©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 3 - Conjugate Prior Distribution for Normal Observations: Unknown Mean, Known Standard Deviation
• The observations �1, … , �� are a random sample from a normal distribution with unknown mean Θ and known standard deviation �: 1 1 � − � � � �, � = exp − Note: exp{x} means ex 2�� 2 � • Conjugate prior distribution for Θ is a normal distribution with mean � and standard deviation � g � �, � = exp − • Joint gpdf for (�, Θ): 1 1 1 � − � � − � � � �, � g � �, � = exp − + 2�� 2�� 2 � � Notice the quadratic terms inside the exponent ©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 4 - Finding the Posterior Distribution
• Prior times likelihood: Posterior distribution of Θ 2 n 2 n 64 .' θ - µ* ' x -θ * 1 68 f (x |θ,σ)g(θ | µ,τ) = 1 1 exp5 − 1 0) , + ) i , 3 9 is normal with mean μ* 2πτ ( 2πσ ) 6 2 ( τ + ∑( σ + 6 7 / i=1 2 : and standard deviation τ*
• Reexpress terms inside square brackets: 2 % µ ∑ xi ( ' 2 + 2 * 2 1 n 2 & τ σ ) • Terms involving � : 2 + 2 θ - Add and subtract: € (τ σ ) 1 n 2 + 2 µ ∑ xi τ σ • Terms involving �: −2 2 + 2 θ ( τ σ ) 2 1 2 − µ ∑ xi µ xi 2 + $ 1 n ' 2 2 • Remaining terms: 2 + 2 - Define: τ σ € τ ∑ σ τ* = & 2 + 2 ) µ* = 1 n i % τ σ ( 2 + 2 • Expression in brackets becomes: τ σ € 2 2 2 • Posterior pdf for Θ: µ ∑xi , / ) ! 2 + $, & θ - µ *) !θ-µ *$ 2 2 2 1 1 µ xi (τ s ) = exp- − 0 +# & +# 2 + 2 − 1 n &. g(θ€| x,µ,τ) = g(θ | µ*,τ*) 2πτ * 2 ( + # τ ∑ σ + & * 2 2 € ' τ * * *+" τ %€ " i τ σ %-. . 1
Exponent in a Constant that does ©Kathryn BlackmondnormalLaskey density not depend on θ Spring €2021 € Unit 5v2a - 5 - € Summary: Prior to Posterior Normal / Normal Conjugate Pair (known variance)
IF Observations �1, … , �� are a random sample from Normal(Θ, �) and prior distribution for Θ is Normal(�, �) THEN Posterior distribution for Θ is Normal(�∗, �∗), another member of the conjugate family. The posterior hyperparameters are given by:
µ ∑ xi 1 + − 2 τ 2 σ 2 $ 1 n ' µ* = 1 n τ* = & 2 + 2 ) + % τ σ ( τ 2 σ 2
€ € ©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 6 - A Useful re-Parameterization
• Define the precision as the inverse of the variance: 2 • �� has precision � = 1/� • Θ has precision � = 1/�2 • Precision is always positive and is measured in the inverse square units of � • Parameters for the posterior distribution of Θ : • �∗ = � + ��
1 ∗ λµ + ρ X λµ +(nρ) n ∑Xi • � = ∑ i = ( ) λ + nρ λ + nρ • Each observation increases the precision of the posterior distribution by the precision � of one observation € • The posterior mean is a weighted average of the observations and the prior mean: • The prior mean receives a relative weight of �, the prior precision of the mean • Each observation receives a relative weight of �, the precision of the data distribution
• ∑ �� is sufficient for Θ ©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 7 - Example: Reaction Time Data
• Gelman, et al. analyze data on reaction times for 11 non-schizophrenic and 6 schizophrenic subjects. The data set can be found at: http://www.stat.columbia.edu/~gelman/book/data/schiz.asc • Gelman et al model log reaction times of the 11 non-schizophrenic subjects as normally distributed with subject-dependent mean and common standard deviation • The normal distribution fits well for some subjects and less well for others • The chart shows a normal Q-Q plot for log reaction times for the first subject in the study • We will use this subject’s data to illustrate conjugate normal updating with known standard deviation • Assume standard deviation is known and equal to sample standard deviation for this subject • Value of sample standard deviation is � = 0.127 • Later we will consider unknown standard deviation
©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 8 - Reaction Times: Posterior Distribution
• The observations are normal with mean Θ and standard deviation � = 0.127 (assumed known and equal to sample standard deviation) • Assume uniform prior distribution for Θ • Limit of a Normal(0, �) distribution as � tends to infinity • This is the Jeffreys prior • It is an improper distribution (integrates to ∞) • Data - log reaction time of first non-schizophrenic subject – 30 observations, sample mean 5.73 • The posterior distribution is normal with: ∑ ∗ . • Posterior mean � = = ∑ � = 5.73 . • Posterior standard deviation: 1 30 / �∗ = + = 0.023 ∞ 0.127 • Posterior 95% posterior credible interval for Θ: [5.69, 5.78] R code for this ©Kathryn Blackmond Laskey Spring 2021 example is providedUnit 5v2a - 9 - Normal Conjugate Updating: Sequential Processing of Observations
• We will use the reaction time data to illustrate sequential Bayesian updating as we receive new batches of observations • Receive observations in batches of 5 • Update distribution for mean after each batch • Assume observations are normal with unknown mean and known standard deviation (equal to sample standard deviation of observations) • After each batch we update our distribution for Θ and predict the sample mean of the next batch of observations • To predict the next batch of observations we will need to know the predictive distribution (marginal likelihood) of the sample mean
©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 10 - First Batch of Five Reaction Times
• The observations are normal with mean Θ and standard deviation � = 0.127 (assumed known and equal to sample standard deviation) • Assume uniform prior distribution for Θ • Limit of a Normal(0, �) distribution as � tends to infinity • This is the Jeffreys prior • It is an improper distribution (integrates to ∞) • Data: 5.743, 5.606, 5.858, 5.656, 5.591 (average is 5.691) • The posterior distribution is normal with: ∑ Triplot*for*Obs*145* . 8" • Posterior mean � = = ∑ � = 5.69 . 7" • Posterior standard deviation: 6" / 5" � = + = 0.057 Prior" . 4" • Posterior 95% credible interval for Θ: [5.58, 5.80] 3" NormLik" Probability*Density* 2" Posterior"
• To predict the next batch of five reaction times, 1"
we will need the marginal likelihood for the 0" normal / normal conjugate pair 5.5" 5.6" 5.7" 5.8" 5.9" 6" ©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 11 - θ" Compare: First 5 Observations with Full Data
First 5 Observations All 30 Observations ∗ • Posterior mean � = 5.69 • Posterior mean � = 5.73 ∗ • Posterior SD � = 0.057 • Posterior SD � = 0.023 • 95% Interval for Θ: [5.58, 5.80] • 95% Interval for Θ: [5.69, 5.78]
©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 12 - Marginal Likelihood for Normal / Normal Conjugate Pair (single observation)
• Joint gpdf of (�, Θ): • � � �, � � � �, � = exp − exp = exp − + ∗ = exp − exp − Use algebraic manipulation to ∗ re-express the joint likelihood • Integrate over θ to find the marginal likelihood: ∗ • � � �, �, � = ∫ � � �, � � � �, � ��= exp − ∫ exp − �� ∗ ∗ = exp − = exp − Normal density function • Marginal likelihood for � is a normal distribution with mean � and variance �2 + �2 • Mean of the predictive distribution for � is �, the mean of the prior distribution for Θ • Variance of the predictive distribution for X is the sum of the prior variance of Θ and the variance of � given Θ • Standard deviation of the predictive distribution for X is �2 + �2 ©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 13 - Marginal Likelihood for Sample Mean
2 • Assume �1, … , �� have a normal distribution with mean Θ and variance � • Assume the standard deviation � is known, and Θ has a normal prior distribution with mean � and variance �2 • From properties of the normal distribution, we find: • Conditional on Θ = �, the sample mean ∑ � /� is normally distributed with mean � and variance �2/� • Marginally, integrating over �: • The sample mean ∑ � /� is normally distributed with: • Mean � • Variance �2/� + �2 • Standard deviation �2/� + �2
©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 14 - Using the Normal-Normal Marginal Likelihood to Predict the Mean of the Next 5 Observations
• Our current information about Θ is expressed by a normal distribution with 2 mean � = 5.69, variance � = 0.127 /5 = 0.0032, std. dev � = 0.057 • The predictive distribution for the sample mean of the next 5 observations is a normal distribution with: • Mean � = 5.69 Comparison of Bayesian predictive distribution with prediction based on point estimate • �2/� � 2 Variance + = 0.127 /5 + 0.0032 = 0.0064 6 • Standard deviation 0.08 Bayesian Predic8ve Predict Obs 5 Mean=5.69 6-10 Avg Obs 6-10 • A 95% predictive credible interval for the sample 4 mean of the next 5 observations is [5.53, 5.85] 3 • This interval includes both uncertainty about Θ 2 and uncertainty about the sample mean given Θ Probability Density 1 • Average of next 5 observations is 5.754 0 • This value falls well within the expected range 5.4 5.5 5.6 5.7 5.8 5.9 6 ©Kathryn Blackmond Laskey Spring 2021 Unit 5v2a - 15 - Log Reac3on Time Updating After Next Batch of Reaction Times
• The observations are normal with unknown mean Θ and standard deviation � = 0.127 (assumed known and equal to sample SD) • We use the posterior distribution from our first batch of observations as our prior distribution – Normal(� , � ) • Prior mean: � = 5.69 • Prior standard deviation: � = 0.057 • Data: 5.793, 5.697, 5.875, 5.677, 5.730 (average is 5.754) Triplot*for*Obs*6410* 12"
• The posterior distribution is normal with: 10" Prior" . ∑ NormLik" 8" Posterior" • Posterior mean: � = = 5.723 / / 6" 4"