Bayesian Inference

Bayesian inference Petteri Piiroinen University of Helsinki Spring 2020 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 1 / 62 3 Summarizing the posterior distribution In principle, the posterior distribution contains all the information about the possible parameter values. In practice, we must also present the posterior distribution somehow. Plotting (for 1D and 2D), scatterplot, histogram of simulated values For higher dimensional cases, we could study marginal posterior distribution of some of the parameters Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 2 / 62 3 Summarizing the posterior distribution The usual summary statistics, such as the mean, median, mode, variance, standard devation and different quantiles, that are used to summarize probability distributions Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 3 / 62 3.1 Credible intervals Credible interval is a ‘’Bayesian confidence interval”. very intuitive interpretation: we can say “95 % credible interval actually contains a true parameter value with 95% probability!” Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 4 / 62 3.1.1 Credible interval definition Definition. d A 1 − α -credible set is a subset Iα ⊂ Ω ⊂ R containing a proportion 1 − α of the probability mass of the posterior distribution: P(Θ ∈ Iα|Y = y) = 1 − α, If the set is a region, we call it a credible region and if in addition, d = 1, we call a credible region as credible interval. Usually we talk about a (1 − α) · 100% credible interval; for example, if the confidence level is α = 0.05, we talk about the 95% credible interval. Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 5 / 62 3.1.2 Equal-tailed interval Definition. An equal-tailed interval (also called a central interval) of confidence level α is an interval Iα = [ qα/2, q1−α/2], where qz is a z-quantile of the posterior distribution p(·|y). Since we have assumed the parameter to be continuous, quantiles are defined (only zeros in the middle might cause issues) for the p(·|y). Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 6 / 62 3.1.2 Equal-tailed interval If we can solve the posterior distribution in a closed form, quantiles can be obtained via the quantile function of the posterior distribution: −1 P(Θ ≤ qz | Y = y) = z ⇐⇒ qz = FΘ|Y(z | y), −1 This quantile function FΘ | Y is an inverse of the cumulative density function (cdf) FΘ | Y of the posterior distribution. If there are zeroes in the posterior density the quantile function needs to be defined in other way Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 7 / 62 3.1.2 Equal-tailed interval Usually, when a credible interval is mentioned without specifying which type of the credible interval it is, an equal-tailed interval is meant. However, unless the posterior distribution is unimodal and symmetric, we might prefer using the highest posterior density criterion for choosing the credible interval. But before, an example. Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 8 / 62 3.1.3. numerical example continuation of Poisson-gamma Example 2.1.1 we have observed a data set y = (4, 3, 11, 3, 6) model: sampling distribution / likelihood Y1,..., Yn ∼ Poisson(λ)⊥⊥| λ prior: gamma-distribution with hyperparameters α = β = 1 λ ∼ Gamma(1, 1) we want to compute 95% confidence interval for the parameter λ Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 9 / 62 3.1.3. numerical example data, hyperparameters and a confidence level y <- c(4,3, 11,3,6) n <- length(y) alpha <-1 beta <-1 alpha_conf <- 0.05 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 10 / 62 3.1.3. numerical example posterior distribution for the λ: λ | Y ∼ Gamma(nY + α, n + β). alpha_1 <- sum(y) + alpha beta_1 <-n + beta Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 11 / 62 3.1.3. numerical example The quantiles are hence: q_lower <- qgamma(alpha_conf / 2, alpha_1, beta_1) q_upper <- qgamma(1 - alpha_conf / 2, alpha_1, beta_1) c(q_lower, q_upper) ## [1] 3.100966 6.547264 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 12 / 62 3.1.3. numerical example 1.5 prior posterior 1.0 |y) λ p( 0.5 0.0 0 1 2 3 4 5 6 7 λ Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 13 / 62 3.1.3. numerical example lambda <- seq(0,7, by = 0.001) # set up grid for plotting lambda_true <-3 plot(lambda, dgamma(lambda, alpha_1, beta_1), type = 'l', lwd =2, col = 'violet', ylim = c(0, 1.5), xlab = expression(lambda), ylab = expression(paste('p(', lambda, '|y)'))) y_val <- dgamma(lambda, alpha_1, beta_1) x_coord <- c(q_lower, lambda[lambda >= q_lower & lambda <= q_upper], q_upper) y_coord <- c(0, y_val[lambda >= q_lower & lambda <= q_upper], 0) Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 14 / 62 3.1.3. numerical example polygon(x_coord, y_coord, col = 'pink', lwd =2, border = 'violet') abline(v = lambda_true, lty =2) lines(lambda, dgamma(lambda, alpha, beta), type = 'l', lwd =2, col = 'orange') legend('topright', inset = .02, legend = c('prior', 'posterior'), col = c('orange', 'violet'), lwd =2) Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 15 / 62 3.1.3. numerical example 3.0 3.0 2.5 prior 2.5 n=1 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 3.0 3.0 2.5 n=2 2.5 n=5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 16 / 62 3.1.3. numerical example 3.0 3.0 2.5 n=10 2.5 n=50 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 3.0 3.0 2.5 n=100 2.5 n=200 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 17 / 62 3.1.3. numerical example – prior effects we observe more data, the credible interval get narrower orange area: credible interval that is computed using the prior distribution Stronger prior λ ∼ Gamma(10, 10) (with same expectation Eλ = α/β = 1) Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 18 / 62 3.1.3. numerical example 3.0 3.0 2.5 prior 2.5 n=1 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 3.0 3.0 2.5 n=2 2.5 n=5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 19 / 62 3.1.3. numerical example 3.0 3.0 2.5 n=10 2.5 n=50 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 3.0 3.0 2.5 n=100 2.5 n=200 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 20 / 62 3.1.4 Highest posterior density region Definition. A highest posterior density (HPD) set of confidence level α is a (1 − α)-confidence set Iα for which holds that the posterior density for every point in this set is higher than the posterior density for any point outside of this set: 0 fΘ|Y(θ|y) ≥ fΘ|Y(θ |y) 0 for all θ ∈ Iα, and θ ∈/ Iα. This means that a (1 − α)-highest density posterior set is a smallest possible (1 − α)-credible set. Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 21 / 62 3.1.4 Highest posterior density region the HPD is not necessarily an interval (or a connected region in a higher-dimensional case): if the posterior distribution is multimodal, the HPD set of this distribution may be an union of distinct intervals (or distinct contiguous regions in a higher-dimensional case). This means that HPD sets are not necessarily always strictly credible intervals or regions. However, it is very commen to talk simply about HPD intervals, even though may not always be intervals. Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 22 / 62 3.1.4. HPD - example Let’s create a bimodal example (mixing two beta distributions) Note: we have seen the mixing before! The mixture distribution of Y is ( Y | (Θ = θi ) ∼ Beta(αi , βi ) 1 Θ ∼ 1 + Bernoulli( 2 ) Therefore, the marginal likelihood is 1 f (y) = f (y | θ ) + f (y | θ ) 2 1 2 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 23 / 62 3.1.4.

Bayesian Inference

Statistical Methods for Data Science, Lecture 5 Interval Estimates; Comparing Systems

Points for Discussion

The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective

More on Bayesian Methods: Part II

On the Frequentist Coverage of Bayesian Credible Intervals for Lower Bounded Means

Bayesian Inference for Median of the Lognormal Distribution K

Bayestest Interval — Interval Hypothesis Testing

Improvement of Bayesian Credible Interval for a Small Binomial Proportion Using Logit Transformation

The Support Interval

Bayesian Random-Effects Meta-Analysis Using the Bayesmeta

Estimating SARS-Cov-2 Seroprevalence And

Making Models with Bayes