STAT 618 Bayesian Statistics Lecture Notes Michael Baron

Michael Baron American University STAT 618 Bayesian Statistics Lecture Notes Michael Baron Department of Mathematics & Statistics Office: DMTI 106-D Office hours: Mon, Wed, Thu 4:00 5:00 pm Phone: 202-885-3130 Email: [email protected] Contents 1 Introduction to Decision Theory and Bayesian Philosophy 1 1.1 DecisionsandActionSpaces .......................... 1 1.2 FrequentistApproach .............................. 2 1.3 Bayesianapproach ................................ 6 1.3.1 Priorandposterior............................ 7 1.4 Lossfunction ................................... 9 1.4.1 Risks and optimal statistical decisions . 10 2 Review of Some Probability 15 2.1 RulesofProbability ............................... 15 2.2 Discreterandomvariables ............................ 17 2.2.1 Joint distribution and marginal distributions . 18 2.2.2 Expectation, variance, and covariance . 19 2.3 Discrete distribution families . 21 2.4 Continuousrandomvariables .......................... 23 3 Review of Basic Statistics 29 3.1 Parametersandtheirestimates . 29 3.1.1 Mean ................................... 30 3.1.2 Median .................................. 31 3.1.3 Quantiles ................................. 33 3.1.4 Varianceandstandarddeviation . 34 3 3.1.5 Methodofmoments ........................... 35 3.1.6 Method of maximum likelihood . 36 3.2 Confidenceintervals ............................... 37 3.3 TestingHypotheses ............................... 39 3.3.1 Level α tests ............................... 40 3.3.2 Z-tests................................... 41 3.3.3 T-tests................................... 44 3.3.4 Duality: two-sided tests and two-sided confidence intervals . .... 44 3.3.5 P-value .................................. 44 4 Choice of a Prior Distribution 49 4.1 Subjective choice of a prior distribution . 50 4.2 Empirical Bayes solutions . 50 4.3 Conjugatepriors ................................. 51 4.3.1 Gamma family is conjugate to the Poisson model . 51 4.3.2 Beta family is conjugate to the Binomial model . 53 4.3.3 Normal family is conjugate to the Normal model . 53 4.3.4 Combiningthedataandtheprior . 54 4.4 Non-informative prior distributions and generalized Bayes rules . ..... 55 4.4.1 Invariant non-informative priors . 57 5 Loss Functions, Utility Theory, and Rao-Blackwellization 59 5.1 Utility Theory and Subjectively Chosen Loss Functions . ... 59 5.1.1 Construction of a utility function . 60 5.2 Standard loss functions and corresponding Bayes decision rules ....... 66 5.2.1 Squared-errorloss ............................ 66 5.2.2 Absolute-errorloss ............................ 67 5.2.3 Zero-oneloss ............................... 68 5.2.4 Otherlossfunctions ........................... 70 5.3 ConvexityandtheRao-BlackwellTheorem . 71 5.3.1 Convexfunctions............................. 71 5.3.2 Jensen’sInequality............................ 72 5.3.3 Sufficientstatistics ............................ 73 5.3.4 Rao-BlackwellTheorem ......................... 74 6 Intro to WinBUGS 75 6.1 Installation .................................... 75 6.2 Basic Program: Model, Data, and Initialization . 76 6.2.1 Modelspecification............................ 76 6.3 EnteringData .................................. 80 6.3.1 Initialization . 81 7 Bayesian Inference: Estimation, Hypothesis Testing, Prediction 83 7.1 Bayesianestimation ............................... 83 7.1.1 Precision evaluation . 85 7.2 Bayesiancrediblesets .............................. 87 7.3 Bayesianhypothesistesting . 91 7.3.1 Zero-wi lossfunction........................... 92 7.3.2 Bayesian classification . 93 7.3.3 Bayesfactors ............................... 94 7.4 Bayesianprediction ............................... 94 8 Monte Carlo methods 99 8.1 Applications and examples . 100 8.2 Estimating probabilities . 102 8.3 Estimating means and standard deviations . 104 8.4 Forecasting .................................... 105 8.5 MonteCarlointegration ............................. 105 8.6 MarkovChainsMonteCarlo .......................... 107 8.7 GibbsSampler .................................. 108 8.8 Central Limit Theorem for the Posterior Distribution . ... 111 8.8.1 Preparation................................ 111 8.8.2 Main result, asymptotic normality . 112 9 ComputerGenerationofRandomVariables 115 9.1 Randomnumbergenerators . 115 9.2 Discretemethods ................................ 116 9.3 Inversetransformmethod . 120 9.4 Rejectionmethod ................................ 123 9.5 Generationofrandomvectors . 125 9.6 GibbsSampler .................................. 126 9.7 TheMetropolisAlgorithm ........................... 129 9.8 The Metropolis-Hastings Algorithm . 130 10 Review of Markov Chains 131 10.1Maindefinitions ................................. 131 10.2Markovchains .................................. 133 10.3Matrixapproach ................................. 137 10.4 Steady-statedistribution . 141 11 Empirical and Hierarchical Bayes Analysis 149 11.1 ParametricEmpiricalBayesApproach . 149 11.2 NonparametricEmpiricalBayesApproach . 150 11.3 HierarchicalBayesApproach . 152 12 Minimax Decisions and Game Theory 153 12.1Generalconcepts ................................. 153 12.2Ann,Jim,andMarkproblem . .. .. .. .. .. .. .. .. .. .. 154 13 Appendix 159 13.1 Inventoryofdistributions . 159 13.1.1 Discrete families . 159 13.1.2 Continuous families . 161 13.2 Distribution tables . 165 13.3Calculusreview ................................. 173 13.3.1 Inversefunction.............................. 173 13.3.2 Limits and continuity . 173 13.3.3 Sequencesandseries .. .. .. .. .. .. .. .. .. .. .. 174 13.3.4 Derivatives, minimum, and maximum . 175 13.3.5 Integrals.................................. 176 Chapter 1 Introduction to Decision Theory and Bayesian Philosophy 1.1 Decisions and Action Spaces ..................................... 1 1.2 Frequentist Approach ............................................ 2 1.3 Bayesian approach ............................................... 6 1.3.1 Prior and posterior ...................................... 7 1.4 Loss function ................................................... .. 9 1.4.1 Risks and optimal statistical decisions .................. 10 Welcome to the new semester! And to this course, where we discuss the main concepts and methods of Statistical decision theory Bayesian modeling Bayesian decision making - estimation, hypothesis testing, and forecasting Bayesian modeling and computation using R and BUGS Minimax approach and game theory First, let us introduce the main concepts and discuss how this course is different from other Statistics courses that you took, will take, or ... will not take. 1.1 Decisions and Action Spaces Practically, in all Statistics courses we learn how to make decisions under uncertainty. Formally, we are looking for a decision δ that belongs to an action space - a set of all A possible decisions that we are allowed to take. Example 1.1 (Estimation). For example, in estimation of parameters, our decision is an estimator δ = θˆ of some unknown parameter θ, and the action space consists of all A possible values of this parameter. If we are estimating the mean of a Normal distribution, 1 2 STAT 618 Bayesian Statistics Lecture Notes = ( , ). But if we are estimating its variance, then = (0, ). To estimate the A −∞ ∞ A ∞ proportion of voters supporting a certain candidate, we should take = [0, 1]. ♦ A Example 1.2 (Testing). When we test a hypothesis, in the end of the day we have to either accept or reject it. Then the action space consists of just two elements, = accept the null hypothesis H , A { 0 reject the hypothesis H in favor of the alternative H . 0 A } ♦ Example 1.3 (Investments). There are other kinds of problems. When making an in- vestment, we have to decide when to invest, where to invest, and how much. A combination of these will be our decision. ♦ Notation δ = decision = action space; δ A ∈ A θ = parameter H0 = null hypothesis HA = alternative hypothesis 1.2 Frequentist Approach Now, what kind of information do we use when making these decisions? Is it a silly question? We know that statisticians collect random samples of data and do their statistics based on them. So, their decisions are functions of data, δ = δ(data) = δ(X1,...,Xn). This is the frequentist approach. According to it, uncertainty comes from a random sample and its distribution. The only considered distributions, expectations, and variances are distributions, expectations, and variances of data and various statistics computed from data. Population parameters are considered fixed. Statistical procedures are based on the distribution of data given these parameters, f(x θ)= f(X ,...,X θ). | 1 n | Properties of these procedures can be stated in terms of long-run frequencies. For example (review Chapter 3, if needed), Introduction to Decision Theory and Bayesian Philosophy 3 – an estimator θˆ is unbiased if in a long run of random samples, it averages to the parameter θ; – a test has significance level α if in a long run of random samples, 100α% of times the true hypothesis is rejected; – an interval has confidence level (1 α) if in a long run of random samples, (1 α)100% − − of obtained confidence intervals contain the parameter, as shown in Figure 3.5, p. 38; – and so on. However, there are many situations when using only the data is not sufficient for meaningful decisions. Example 1.4 (Same data, different conclusions (L. Savage)). Imagine three situations: 1. A music expert claims that she can easily distinguish between a line from Mozart and a line from

Load more