Bayesian Estimation & Information Theory

Jonathan Pillow

Mathematical Tools for Neuroscience (NEU 314) Spring, 2016

lecture 18 Bayesian Estimation

three basic ingredients:

1. Likelihood jointly determine the posterior

2. Prior

“cost” of making an estimate 3. L(✓ˆ, ✓) if the true value is

• fully specifies how to generate an estimate from the data

Bayesian estimator is defined as:

✓ˆ(m) = arg min L(✓ˆ, ✓)p(✓ m)d✓ “Bayes’ risk” ˆ | ✓ Z Typical Loss functions and Bayesian estimators

2 1. L(✓ˆ, ✓)=(✓ˆ ✓) squared error loss 0

need to find minimizing the expected loss:

Differentiate with respect to and set to zero:

“posterior

also known as Bayes’ Least Squares (BLS) estimator Typical Loss functions and Bayesian estimators

ˆ ˆ “zero-one” loss 2. L(✓, ✓)=1 (✓ ✓) (1 unless ) 0

expected loss:

which is minimized by:

• posterior maximum (or “mode”). • known as maximum a posteriori (MAP) estimate. MAP vs. Posterior Mean estimate:

0.3

0.2

0.1 gamma pdf

0 0 2 4 6 8 10

Note: posterior maximum and mean not always the same! Typical Loss functions and Bayesian estimators

3. “L1” loss 0

expected loss:

HW problem: What is the Bayesian estimator for this loss function? Simple Example: Gaussian noise & prior

1. Likelihood additive Gaussian noise

zero-mean Gaussian 2. Prior

3. Loss function: doesn’t matter (all agree here)

posterior distribution

MAP estimate Likelihood

8 m 0

-8

-8 0 8 θ Likelihood

8 m 0

-8

-8 0 8 θ Likelihood

8 m 0

-8

-8 0 8 θ Likelihood

8 -8 0 8 m 0

-8 -8 0 8

-8 0 8 θ Prior

8 m 0

-8

-8 0 8 θ Computing the posterior

likelihood prior posterior

0 x ∝

m θ 0 0 0 Making an Bayesian Estimate:

likelihood prior posterior

m*

0 x ∝

m θ 0 0 0 bias

0 0 0 High Measurement Noise: large bias

likelihood prior posterior

0 x ∝

m θ 0 0 0 larger bias

0 0 0 Low Measurement Noise: small bias

likelihood prior posterior

0 x ∝

m θ 0 0 0 small bias

0 0 0 Bayesian Estimation:

• Likelihood and prior combine to form posterior • Bayesian estimate is always biased towards the prior (from the ML estimate) Application #1: Biases in Motion Perception

+

Which grating moves faster? Application #1: Biases in Motion Perception

+

Which grating moves faster? Explanation from Weiss, Simoncelli & Adelson (2002):

posterior likelihood

prior likelihood prior

0 0

Noisier measurements, so likelihood is broader ⇒ posterior has

larger shift toward 0 (prior = no motion)

• In the limit of a zero-contrast grating, likelihood becomes infinitely broad ⇒ percept goes to zero-motion.

• Claim: explains why people actually speed up when driving in fog! summary

• 3 ingredients for Bayesian estimation (prior, likelihood, loss) • Bayes’ least squares (BLS) estimator (posterior mean) • maximum a posteriori (MAP) estimator (posterior mode) • accounts for stimulus-quality dependent bias in motion perception (Weiss, Simoncelli & Adelson 2002)