Bayesian Estimation & Information Theory
Jonathan Pillow
Mathematical Tools for Neuroscience (NEU 314) Spring, 2016
lecture 18 Bayesian Estimation
three basic ingredients:
1. Likelihood jointly determine the posterior
2. Prior
“cost” of making an estimate 3. Loss function L(✓ˆ, ✓) if the true value is
• fully specifies how to generate an estimate from the data
Bayesian estimator is defined as:
✓ˆ(m) = arg min L(✓ˆ, ✓)p(✓ m)d✓ “Bayes’ risk” ˆ | ✓ Z Typical Loss functions and Bayesian estimators
2 1. L(✓ˆ, ✓)=(✓ˆ ✓) squared error loss 0
need to find minimizing the expected loss:
Differentiate with respect to and set to zero:
“posterior mean”
also known as Bayes’ Least Squares (BLS) estimator Typical Loss functions and Bayesian estimators
ˆ ˆ “zero-one” loss 2. L(✓, ✓)=1 (✓ ✓) (1 unless ) 0
expected loss:
which is minimized by:
• posterior maximum (or “mode”). • known as maximum a posteriori (MAP) estimate. MAP vs. Posterior Mean estimate:
0.3
0.2
0.1 gamma pdf
0 0 2 4 6 8 10
Note: posterior maximum and mean not always the same! Typical Loss functions and Bayesian estimators
3. “L1” loss 0
expected loss:
HW problem: What is the Bayesian estimator for this loss function? Simple Example: Gaussian noise & prior
1. Likelihood additive Gaussian noise
zero-mean Gaussian 2. Prior
3. Loss function: doesn’t matter (all agree here)
posterior distribution
MAP estimate variance Likelihood
8 m 0
-8
-8 0 8 θ Likelihood
8 m 0
-8
-8 0 8 θ Likelihood
8 m 0
-8
-8 0 8 θ Likelihood
8 -8 0 8 m 0
-8 -8 0 8
-8 0 8 θ Prior
8 m 0
-8
-8 0 8 θ Computing the posterior
likelihood prior posterior
0 x ∝
m θ 0 0 0 Making an Bayesian Estimate:
likelihood prior posterior
m*
0 x ∝
m θ 0 0 0 bias
0 0 0 High Measurement Noise: large bias
likelihood prior posterior
0 x ∝
m θ 0 0 0 larger bias
0 0 0 Low Measurement Noise: small bias
likelihood prior posterior
0 x ∝
m θ 0 0 0 small bias
0 0 0 Bayesian Estimation:
• Likelihood and prior combine to form posterior • Bayesian estimate is always biased towards the prior (from the ML estimate) Application #1: Biases in Motion Perception
+
Which grating moves faster? Application #1: Biases in Motion Perception
+
Which grating moves faster? Explanation from Weiss, Simoncelli & Adelson (2002):
posterior likelihood
prior likelihood prior
0 0
Noisier measurements, so likelihood is broader ⇒ posterior has
larger shift toward 0 (prior = no motion)
• In the limit of a zero-contrast grating, likelihood becomes infinitely broad ⇒ percept goes to zero-motion.
• Claim: explains why people actually speed up when driving in fog! summary
• 3 ingredients for Bayesian estimation (prior, likelihood, loss) • Bayes’ least squares (BLS) estimator (posterior mean) • maximum a posteriori (MAP) estimator (posterior mode) • accounts for stimulus-quality dependent bias in motion perception (Weiss, Simoncelli & Adelson 2002)