Bayesian Inference

Bayesian Inference

Bayesian inference Petteri Piiroinen University of Helsinki Spring 2020 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 1 / 62 3 Summarizing the posterior distribution In principle, the posterior distribution contains all the information about the possible parameter values. In practice, we must also present the posterior distribution somehow. Plotting (for 1D and 2D), scatterplot, histogram of simulated values For higher dimensional cases, we could study marginal posterior distribution of some of the parameters Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 2 / 62 3 Summarizing the posterior distribution The usual summary statistics, such as the mean, median, mode, variance, standard devation and different quantiles, that are used to summarize probability distributions Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 3 / 62 3.1 Credible intervals Credible interval is a ‘’Bayesian confidence interval”. very intuitive interpretation: we can say “95 % credible interval actually contains a true parameter value with 95% probability!” Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 4 / 62 3.1.1 Credible interval definition Definition. d A 1 − α -credible set is a subset Iα ⊂ Ω ⊂ R containing a proportion 1 − α of the probability mass of the posterior distribution: P(Θ ∈ Iα|Y = y) = 1 − α, If the set is a region, we call it a credible region and if in addition, d = 1, we call a credible region as credible interval. Usually we talk about a (1 − α) · 100% credible interval; for example, if the confidence level is α = 0.05, we talk about the 95% credible interval. Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 5 / 62 3.1.2 Equal-tailed interval Definition. An equal-tailed interval (also called a central interval) of confidence level α is an interval Iα = [ qα/2, q1−α/2], where qz is a z-quantile of the posterior distribution p(·|y). Since we have assumed the parameter to be continuous, quantiles are defined (only zeros in the middle might cause issues) for the p(·|y). Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 6 / 62 3.1.2 Equal-tailed interval If we can solve the posterior distribution in a closed form, quantiles can be obtained via the quantile function of the posterior distribution: −1 P(Θ ≤ qz | Y = y) = z ⇐⇒ qz = FΘ|Y(z | y), −1 This quantile function FΘ | Y is an inverse of the cumulative density function (cdf) FΘ | Y of the posterior distribution. If there are zeroes in the posterior density the quantile function needs to be defined in other way Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 7 / 62 3.1.2 Equal-tailed interval Usually, when a credible interval is mentioned without specifying which type of the credible interval it is, an equal-tailed interval is meant. However, unless the posterior distribution is unimodal and symmetric, we might prefer using the highest posterior density criterion for choosing the credible interval. But before, an example. Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 8 / 62 3.1.3. numerical example continuation of Poisson-gamma Example 2.1.1 we have observed a data set y = (4, 3, 11, 3, 6) model: sampling distribution / likelihood Y1,..., Yn ∼ Poisson(λ)⊥⊥| λ prior: gamma-distribution with hyperparameters α = β = 1 λ ∼ Gamma(1, 1) we want to compute 95% confidence interval for the parameter λ Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 9 / 62 3.1.3. numerical example data, hyperparameters and a confidence level y <- c(4,3, 11,3,6) n <- length(y) alpha <-1 beta <-1 alpha_conf <- 0.05 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 10 / 62 3.1.3. numerical example posterior distribution for the λ: λ | Y ∼ Gamma(nY + α, n + β). alpha_1 <- sum(y) + alpha beta_1 <-n + beta Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 11 / 62 3.1.3. numerical example The quantiles are hence: q_lower <- qgamma(alpha_conf / 2, alpha_1, beta_1) q_upper <- qgamma(1 - alpha_conf / 2, alpha_1, beta_1) c(q_lower, q_upper) ## [1] 3.100966 6.547264 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 12 / 62 3.1.3. numerical example 1.5 prior posterior 1.0 |y) λ p( 0.5 0.0 0 1 2 3 4 5 6 7 λ Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 13 / 62 3.1.3. numerical example lambda <- seq(0,7, by = 0.001) # set up grid for plotting lambda_true <-3 plot(lambda, dgamma(lambda, alpha_1, beta_1), type = 'l', lwd =2, col = 'violet', ylim = c(0, 1.5), xlab = expression(lambda), ylab = expression(paste('p(', lambda, '|y)'))) y_val <- dgamma(lambda, alpha_1, beta_1) x_coord <- c(q_lower, lambda[lambda >= q_lower & lambda <= q_upper], q_upper) y_coord <- c(0, y_val[lambda >= q_lower & lambda <= q_upper], 0) Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 14 / 62 3.1.3. numerical example polygon(x_coord, y_coord, col = 'pink', lwd =2, border = 'violet') abline(v = lambda_true, lty =2) lines(lambda, dgamma(lambda, alpha, beta), type = 'l', lwd =2, col = 'orange') legend('topright', inset = .02, legend = c('prior', 'posterior'), col = c('orange', 'violet'), lwd =2) Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 15 / 62 3.1.3. numerical example 3.0 3.0 2.5 prior 2.5 n=1 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 3.0 3.0 2.5 n=2 2.5 n=5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 16 / 62 3.1.3. numerical example 3.0 3.0 2.5 n=10 2.5 n=50 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 3.0 3.0 2.5 n=100 2.5 n=200 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 17 / 62 3.1.3. numerical example – prior effects we observe more data, the credible interval get narrower orange area: credible interval that is computed using the prior distribution Stronger prior λ ∼ Gamma(10, 10) (with same expectation Eλ = α/β = 1) Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 18 / 62 3.1.3. numerical example 3.0 3.0 2.5 prior 2.5 n=1 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 3.0 3.0 2.5 n=2 2.5 n=5 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 19 / 62 3.1.3. numerical example 3.0 3.0 2.5 n=10 2.5 n=50 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 3.0 3.0 2.5 n=100 2.5 n=200 2.0 2.0 1.5 1.5 1.0 1.0 0.5 0.5 0.0 0.0 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 20 / 62 3.1.4 Highest posterior density region Definition. A highest posterior density (HPD) set of confidence level α is a (1 − α)-confidence set Iα for which holds that the posterior density for every point in this set is higher than the posterior density for any point outside of this set: 0 fΘ|Y(θ|y) ≥ fΘ|Y(θ |y) 0 for all θ ∈ Iα, and θ ∈/ Iα. This means that a (1 − α)-highest density posterior set is a smallest possible (1 − α)-credible set. Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 21 / 62 3.1.4 Highest posterior density region the HPD is not necessarily an interval (or a connected region in a higher-dimensional case): if the posterior distribution is multimodal, the HPD set of this distribution may be an union of distinct intervals (or distinct contiguous regions in a higher-dimensional case). This means that HPD sets are not necessarily always strictly credible intervals or regions. However, it is very commen to talk simply about HPD intervals, even though may not always be intervals. Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 22 / 62 3.1.4. HPD - example Let’s create a bimodal example (mixing two beta distributions) Note: we have seen the mixing before! The mixture distribution of Y is ( Y | (Θ = θi ) ∼ Beta(αi , βi ) 1 Θ ∼ 1 + Bernoulli( 2 ) Therefore, the marginal likelihood is 1 f (y) = f (y | θ ) + f (y | θ ) 2 1 2 Petteri Piiroinen (University of Helsinki) Bayesian inference Spring 2020 23 / 62 3.1.4.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    62 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us