STAT 618 Bayesian Statistics Lecture Notes Michael Baron

Total Page:16

File Type:pdf, Size:1020Kb

STAT 618 Bayesian Statistics Lecture Notes Michael Baron Michael Baron American University STAT 618 Bayesian Statistics Lecture Notes Michael Baron Department of Mathematics & Statistics Office: DMTI 106-D Office hours: Mon, Wed, Thu 4:00 5:00 pm Phone: 202-885-3130 Email: [email protected] Contents 1 Introduction to Decision Theory and Bayesian Philosophy 1 1.1 DecisionsandActionSpaces .......................... 1 1.2 FrequentistApproach .............................. 2 1.3 Bayesianapproach ................................ 6 1.3.1 Priorandposterior............................ 7 1.4 Lossfunction ................................... 9 1.4.1 Risks and optimal statistical decisions . 10 2 Review of Some Probability 15 2.1 RulesofProbability ............................... 15 2.2 Discreterandomvariables ............................ 17 2.2.1 Joint distribution and marginal distributions . 18 2.2.2 Expectation, variance, and covariance . 19 2.3 Discrete distribution families . 21 2.4 Continuousrandomvariables .......................... 23 3 Review of Basic Statistics 29 3.1 Parametersandtheirestimates . 29 3.1.1 Mean ................................... 30 3.1.2 Median .................................. 31 3.1.3 Quantiles ................................. 33 3.1.4 Varianceandstandarddeviation . 34 3 3.1.5 Methodofmoments ........................... 35 3.1.6 Method of maximum likelihood . 36 3.2 Confidenceintervals ............................... 37 3.3 TestingHypotheses ............................... 39 3.3.1 Level α tests ............................... 40 3.3.2 Z-tests................................... 41 3.3.3 T-tests................................... 44 3.3.4 Duality: two-sided tests and two-sided confidence intervals . .... 44 3.3.5 P-value .................................. 44 4 Choice of a Prior Distribution 49 4.1 Subjective choice of a prior distribution . 50 4.2 Empirical Bayes solutions . 50 4.3 Conjugatepriors ................................. 51 4.3.1 Gamma family is conjugate to the Poisson model . 51 4.3.2 Beta family is conjugate to the Binomial model . 53 4.3.3 Normal family is conjugate to the Normal model . 53 4.3.4 Combiningthedataandtheprior . 54 4.4 Non-informative prior distributions and generalized Bayes rules . ..... 55 4.4.1 Invariant non-informative priors . 57 5 Loss Functions, Utility Theory, and Rao-Blackwellization 59 5.1 Utility Theory and Subjectively Chosen Loss Functions . ... 59 5.1.1 Construction of a utility function . 60 5.2 Standard loss functions and corresponding Bayes decision rules ....... 66 5.2.1 Squared-errorloss ............................ 66 5.2.2 Absolute-errorloss ............................ 67 5.2.3 Zero-oneloss ............................... 68 5.2.4 Otherlossfunctions ........................... 70 5.3 ConvexityandtheRao-BlackwellTheorem . 71 5.3.1 Convexfunctions............................. 71 5.3.2 Jensen’sInequality............................ 72 5.3.3 Sufficientstatistics ............................ 73 5.3.4 Rao-BlackwellTheorem ......................... 74 6 Intro to WinBUGS 75 6.1 Installation .................................... 75 6.2 Basic Program: Model, Data, and Initialization . 76 6.2.1 Modelspecification............................ 76 6.3 EnteringData .................................. 80 6.3.1 Initialization . 81 7 Bayesian Inference: Estimation, Hypothesis Testing, Prediction 83 7.1 Bayesianestimation ............................... 83 7.1.1 Precision evaluation . 85 7.2 Bayesiancrediblesets .............................. 87 7.3 Bayesianhypothesistesting . 91 7.3.1 Zero-wi lossfunction........................... 92 7.3.2 Bayesian classification . 93 7.3.3 Bayesfactors ............................... 94 7.4 Bayesianprediction ............................... 94 8 Monte Carlo methods 99 8.1 Applications and examples . 100 8.2 Estimating probabilities . 102 8.3 Estimating means and standard deviations . 104 8.4 Forecasting .................................... 105 8.5 MonteCarlointegration ............................. 105 8.6 MarkovChainsMonteCarlo .......................... 107 8.7 GibbsSampler .................................. 108 8.8 Central Limit Theorem for the Posterior Distribution . ... 111 8.8.1 Preparation................................ 111 8.8.2 Main result, asymptotic normality . 112 9 ComputerGenerationofRandomVariables 115 9.1 Randomnumbergenerators . 115 9.2 Discretemethods ................................ 116 9.3 Inversetransformmethod . 120 9.4 Rejectionmethod ................................ 123 9.5 Generationofrandomvectors . 125 9.6 GibbsSampler .................................. 126 9.7 TheMetropolisAlgorithm ........................... 129 9.8 The Metropolis-Hastings Algorithm . 130 10 Review of Markov Chains 131 10.1Maindefinitions ................................. 131 10.2Markovchains .................................. 133 10.3Matrixapproach ................................. 137 10.4 Steady-statedistribution . 141 11 Empirical and Hierarchical Bayes Analysis 149 11.1 ParametricEmpiricalBayesApproach . 149 11.2 NonparametricEmpiricalBayesApproach . 150 11.3 HierarchicalBayesApproach . 152 12 Minimax Decisions and Game Theory 153 12.1Generalconcepts ................................. 153 12.2Ann,Jim,andMarkproblem . .. .. .. .. .. .. .. .. .. .. 154 13 Appendix 159 13.1 Inventoryofdistributions . 159 13.1.1 Discrete families . 159 13.1.2 Continuous families . 161 13.2 Distribution tables . 165 13.3Calculusreview ................................. 173 13.3.1 Inversefunction.............................. 173 13.3.2 Limits and continuity . 173 13.3.3 Sequencesandseries .. .. .. .. .. .. .. .. .. .. .. 174 13.3.4 Derivatives, minimum, and maximum . 175 13.3.5 Integrals.................................. 176 Chapter 1 Introduction to Decision Theory and Bayesian Philosophy 1.1 Decisions and Action Spaces ..................................... 1 1.2 Frequentist Approach ............................................ 2 1.3 Bayesian approach ............................................... 6 1.3.1 Prior and posterior ...................................... 7 1.4 Loss function ................................................... .. 9 1.4.1 Risks and optimal statistical decisions .................. 10 Welcome to the new semester! And to this course, where we discuss the main concepts and methods of Statistical decision theory Bayesian modeling Bayesian decision making - estimation, hypothesis testing, and forecasting Bayesian modeling and computation using R and BUGS Minimax approach and game theory First, let us introduce the main concepts and discuss how this course is different from other Statistics courses that you took, will take, or ... will not take. 1.1 Decisions and Action Spaces Practically, in all Statistics courses we learn how to make decisions under uncertainty. Formally, we are looking for a decision δ that belongs to an action space - a set of all A possible decisions that we are allowed to take. Example 1.1 (Estimation). For example, in estimation of parameters, our decision is an estimator δ = θˆ of some unknown parameter θ, and the action space consists of all A possible values of this parameter. If we are estimating the mean of a Normal distribution, 1 2 STAT 618 Bayesian Statistics Lecture Notes = ( , ). But if we are estimating its variance, then = (0, ). To estimate the A −∞ ∞ A ∞ proportion of voters supporting a certain candidate, we should take = [0, 1]. ♦ A Example 1.2 (Testing). When we test a hypothesis, in the end of the day we have to either accept or reject it. Then the action space consists of just two elements, = accept the null hypothesis H , A { 0 reject the hypothesis H in favor of the alternative H . 0 A } ♦ Example 1.3 (Investments). There are other kinds of problems. When making an in- vestment, we have to decide when to invest, where to invest, and how much. A combination of these will be our decision. ♦ Notation δ = decision = action space; δ A ∈ A θ = parameter H0 = null hypothesis HA = alternative hypothesis 1.2 Frequentist Approach Now, what kind of information do we use when making these decisions? Is it a silly question? We know that statisticians collect random samples of data and do their statistics based on them. So, their decisions are functions of data, δ = δ(data) = δ(X1,...,Xn). This is the frequentist approach. According to it, uncertainty comes from a random sample and its distribution. The only considered distributions, expectations, and variances are distributions, expectations, and variances of data and various statistics computed from data. Population parameters are considered fixed. Statistical procedures are based on the distribution of data given these parameters, f(x θ)= f(X ,...,X θ). | 1 n | Properties of these procedures can be stated in terms of long-run frequencies. For example (review Chapter 3, if needed), Introduction to Decision Theory and Bayesian Philosophy 3 – an estimator θˆ is unbiased if in a long run of random samples, it averages to the parameter θ; – a test has significance level α if in a long run of random samples, 100α% of times the true hypothesis is rejected; – an interval has confidence level (1 α) if in a long run of random samples, (1 α)100% − − of obtained confidence intervals contain the parameter, as shown in Figure 3.5, p. 38; – and so on. However, there are many situations when using only the data is not sufficient for meaningful decisions. Example 1.4 (Same data, different conclusions (L. Savage)). Imagine three sit- uations: 1. A music expert claims that she can easily distinguish between a line from Mozart and a line from
Recommended publications
  • On the Meaning and Use of Kurtosis
    Psychological Methods Copyright 1997 by the American Psychological Association, Inc. 1997, Vol. 2, No. 3,292-307 1082-989X/97/$3.00 On the Meaning and Use of Kurtosis Lawrence T. DeCarlo Fordham University For symmetric unimodal distributions, positive kurtosis indicates heavy tails and peakedness relative to the normal distribution, whereas negative kurtosis indicates light tails and flatness. Many textbooks, however, describe or illustrate kurtosis incompletely or incorrectly. In this article, kurtosis is illustrated with well-known distributions, and aspects of its interpretation and misinterpretation are discussed. The role of kurtosis in testing univariate and multivariate normality; as a measure of departures from normality; in issues of robustness, outliers, and bimodality; in generalized tests and estimators, as well as limitations of and alternatives to the kurtosis measure [32, are discussed. It is typically noted in introductory statistics standard deviation. The normal distribution has a kur- courses that distributions can be characterized in tosis of 3, and 132 - 3 is often used so that the refer- terms of central tendency, variability, and shape. With ence normal distribution has a kurtosis of zero (132 - respect to shape, virtually every textbook defines and 3 is sometimes denoted as Y2)- A sample counterpart illustrates skewness. On the other hand, another as- to 132 can be obtained by replacing the population pect of shape, which is kurtosis, is either not discussed moments with the sample moments, which gives or, worse yet, is often described or illustrated incor- rectly. Kurtosis is also frequently not reported in re- ~(X i -- S)4/n search articles, in spite of the fact that virtually every b2 (•(X i - ~')2/n)2' statistical package provides a measure of kurtosis.
    [Show full text]
  • The Probability Lifesaver: Order Statistics and the Median Theorem
    The Probability Lifesaver: Order Statistics and the Median Theorem Steven J. Miller December 30, 2015 Contents 1 Order Statistics and the Median Theorem 3 1.1 Definition of the Median 5 1.2 Order Statistics 10 1.3 Examples of Order Statistics 15 1.4 TheSampleDistributionoftheMedian 17 1.5 TechnicalboundsforproofofMedianTheorem 20 1.6 TheMedianofNormalRandomVariables 22 2 • Greetings again! In this supplemental chapter we develop the theory of order statistics in order to prove The Median Theorem. This is a beautiful result in its own, but also extremely important as a substitute for the Central Limit Theorem, and allows us to say non- trivial things when the CLT is unavailable. Chapter 1 Order Statistics and the Median Theorem The Central Limit Theorem is one of the gems of probability. It’s easy to use and its hypotheses are satisfied in a wealth of problems. Many courses build towards a proof of this beautiful and powerful result, as it truly is ‘central’ to the entire subject. Not to detract from the majesty of this wonderful result, however, what happens in those instances where it’s unavailable? For example, one of the key assumptions that must be met is that our random variables need to have finite higher moments, or at the very least a finite variance. What if we were to consider sums of Cauchy random variables? Is there anything we can say? This is not just a question of theoretical interest, of mathematicians generalizing for the sake of generalization. The following example from economics highlights why this chapter is more than just of theoretical interest.
    [Show full text]
  • Approximated Bayes and Empirical Bayes Confidence Intervals—
    Ann. Inst. Statist. Math. Vol. 40, No. 4, 747-767 (1988) APPROXIMATED BAYES AND EMPIRICAL BAYES CONFIDENCE INTERVALSmTHE KNOWN VARIANCE CASE* A. J. VAN DER MERWE, P. C. N. GROENEWALD AND C. A. VAN DER MERWE Department of Mathematical Statistics, University of the Orange Free State, PO Box 339, Bloemfontein, Republic of South Africa (Received June 11, 1986; revised September 29, 1987) Abstract. In this paper hierarchical Bayes and empirical Bayes results are used to obtain confidence intervals of the population means in the case of real problems. This is achieved by approximating the posterior distribution with a Pearson distribution. In the first example hierarchical Bayes confidence intervals for the Efron and Morris (1975, J. Amer. Statist. Assoc., 70, 311-319) baseball data are obtained. The same methods are used in the second example to obtain confidence intervals of treatment effects as well as the difference between treatment effects in an analysis of variance experiment. In the third example hierarchical Bayes intervals of treatment effects are obtained and compared with normal approximations in the unequal variance case. Key words and phrases: Hierarchical Bayes, empirical Bayes estimation, Stein estimator, multivariate normal mean, Pearson curves, confidence intervals, posterior distribution, unequal variance case, normal approxima- tions. 1. Introduction In the Bayesian approach to inference, a posterior distribution of unknown parameters is produced as the normalized product of the like- lihood and a prior distribution. Inferences about the unknown parameters are then based on the entire posterior distribution resulting from the one specific data set which has actually occurred. In most hierarchical and empirical Bayes cases these posterior distributions are difficult to derive and cannot be obtained in closed form.
    [Show full text]
  • Theoretical Statistics. Lecture 5. Peter Bartlett
    Theoretical Statistics. Lecture 5. Peter Bartlett 1. U-statistics. 1 Outline of today’s lecture We’ll look at U-statistics, a family of estimators that includes many interesting examples. We’ll study their properties: unbiased, lower variance, concentration (via an application of the bounded differences inequality), asymptotic variance, asymptotic distribution. (See Chapter 12 of van der Vaart.) First, we’ll consider the standard unbiased estimate of variance—a special case of a U-statistic. 2 Variance estimates n 1 s2 = (X − X¯ )2 n n − 1 i n Xi=1 n n 1 = (X − X¯ )2 +(X − X¯ )2 2n(n − 1) i n j n Xi=1 Xj=1 n n 1 2 = (X − X¯ ) − (X − X¯ ) 2n(n − 1) i n j n Xi=1 Xj=1 n n 1 1 = (X − X )2 n(n − 1) 2 i j Xi=1 Xj=1 1 1 = (X − X )2 . n 2 i j 2 Xi<j 3 Variance estimates This is unbiased for i.i.d. data: 1 Es2 = E (X − X )2 n 2 1 2 1 = E ((X − EX ) − (X − EX ))2 2 1 1 2 2 1 2 2 = E (X1 − EX1) +(X2 − EX2) 2 2 = E (X1 − EX1) . 4 U-statistics Definition: A U-statistic of order r with kernel h is 1 U = n h(Xi1 ,...,Xir ), r iX⊆[n] where h is symmetric in its arguments. [If h is not symmetric in its arguments, we can also average over permutations.] “U” for “unbiased.” Introduced by Wassily Hoeffding in the 1940s. 5 U-statistics Theorem: [Halmos] θ (parameter, i.e., function defined on a family of distributions) admits an unbiased estimator (ie: for all sufficiently large n, some function of the i.i.d.
    [Show full text]
  • Paradoxes and Priors in Bayesian Regression
    Paradoxes and Priors in Bayesian Regression Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Agniva Som, B. Stat., M. Stat. Graduate Program in Statistics The Ohio State University 2014 Dissertation Committee: Dr. Christopher M. Hans, Advisor Dr. Steven N. MacEachern, Co-advisor Dr. Mario Peruggia c Copyright by Agniva Som 2014 Abstract The linear model has been by far the most popular and most attractive choice of a statistical model over the past century, ubiquitous in both frequentist and Bayesian literature. The basic model has been gradually improved over the years to deal with stronger features in the data like multicollinearity, non-linear or functional data pat- terns, violation of underlying model assumptions etc. One valuable direction pursued in the enrichment of the linear model is the use of Bayesian methods, which blend information from the data likelihood and suitable prior distributions placed on the unknown model parameters to carry out inference. This dissertation studies the modeling implications of many common prior distri- butions in linear regression, including the popular g prior and its recent ameliorations. Formalization of desirable characteristics for model comparison and parameter esti- mation has led to the growth of appropriate mixtures of g priors that conform to the seven standard model selection criteria laid out by Bayarri et al. (2012). The existence of some of these properties (or lack thereof) is demonstrated by examining the behavior of the prior under suitable limits on the likelihood or on the prior itself.
    [Show full text]
  • 1 Estimation and Beyond in the Bayes Universe
    ISyE8843A, Brani Vidakovic Handout 7 1 Estimation and Beyond in the Bayes Universe. 1.1 Estimation No Bayes estimate can be unbiased but Bayesians are not upset! No Bayes estimate with respect to the squared error loss can be unbiased, except in a trivial case when its Bayes’ risk is 0. Suppose that for a proper prior ¼ the Bayes estimator ±¼(X) is unbiased, Xjθ (8θ)E ±¼(X) = θ: This implies that the Bayes risk is 0. The Bayes risk of ±¼(X) can be calculated as repeated expectation in two ways, θ Xjθ 2 X θjX 2 r(¼; ±¼) = E E (θ ¡ ±¼(X)) = E E (θ ¡ ±¼(X)) : Thus, conveniently choosing either EθEXjθ or EX EθjX and using the properties of conditional expectation we have, θ Xjθ 2 θ Xjθ X θjX X θjX 2 r(¼; ±¼) = E E θ ¡ E E θ±¼(X) ¡ E E θ±¼(X) + E E ±¼(X) θ Xjθ 2 θ Xjθ X θjX X θjX 2 = E E θ ¡ E θ[E ±¼(X)] ¡ E ±¼(X)E θ + E E ±¼(X) θ Xjθ 2 θ X X θjX 2 = E E θ ¡ E θ ¢ θ ¡ E ±¼(X)±¼(X) + E E ±¼(X) = 0: Bayesians are not upset. To check for its unbiasedness, the Bayes estimator is averaged with respect to the model measure (Xjθ), and one of the Bayesian commandments is: Thou shall not average with respect to sample space, unless you have Bayesian design in mind. Even frequentist agree that insisting on unbiasedness can lead to bad estimators, and that in their quest to minimize the risk by trading off between variance and bias-squared a small dosage of bias can help.
    [Show full text]
  • Download Article (PDF)
    Journal of Statistical Theory and Applications, Vol. 17, No. 2 (June 2018) 359–374 ___________________________________________________________________________________________________________ BAYESIAN APPROACH IN ESTIMATION OF SHAPE PARAMETER OF THE EXPONENTIATED MOMENT EXPONENTIAL DISTRIBUTION Kawsar Fatima Department of Statistics, University of Kashmir, Srinagar, India [email protected] S.P Ahmad* Department of Statistics, University of Kashmir, Srinagar, India [email protected] Received 1 November 2016 Accepted 19 June 2017 Abstract In this paper, Bayes estimators of the unknown shape parameter of the exponentiated moment exponential distribution (EMED)have been derived by using two informative (gamma and chi-square) priors and two non- informative (Jeffrey’s and uniform) priors under different loss functions, namely, Squared Error Loss function, Entropy loss function and precautionary Loss function. The Maximum likelihood estimator (MLE) is obtained. Also, we used two real life data sets to illustrate the result derived. Keywords: Exponentiated Moment Exponential distribution; Maximum Likelihood Estimator; Bayesian estimation; Priors; Loss functions. 2000 Mathematics Subject Classification: 22E46, 53C35, 57S20 1. Introduction The exponentiated exponential distribution is a specific family of the exponentiated Weibull distribution. In analyzing several life time data situations, it has been observed that the dual parameter exponentiated exponential distribution can be more effectively used as compared to both dual parameters of gamma or Weibull distribution. When we consider the shape parameter of exponentiated exponential, gamma and Weibull is one, then these distributions becomes one parameter exponential distribution. Hence, these three distributions are the off shoots of the exponential distribution. Moment distributions have a vital role in mathematics and statistics, in particular probability theory, in the viewpoint research related to ecology, reliability, biomedical field, econometrics, survey sampling and in life-testing.
    [Show full text]
  • Lecture 14 Testing for Kurtosis
    9/8/2016 CHE384, From Data to Decisions: Measurement, Kurtosis Uncertainty, Analysis, and Modeling • For any distribution, the kurtosis (sometimes Lecture 14 called the excess kurtosis) is defined as Testing for Kurtosis 3 (old notation = ) • For a unimodal, symmetric distribution, Chris A. Mack – a positive kurtosis means “heavy tails” and a more Adjunct Associate Professor peaked center compared to a normal distribution – a negative kurtosis means “light tails” and a more spread center compared to a normal distribution http://www.lithoguru.com/scientist/statistics/ © Chris Mack, 2016Data to Decisions 1 © Chris Mack, 2016Data to Decisions 2 Kurtosis Examples One Impact of Excess Kurtosis • For the Student’s t • For a normal distribution, the sample distribution, the variance will have an expected value of s2, excess kurtosis is and a variance of 6 2 4 1 for DF > 4 ( for DF ≤ 4 the kurtosis is infinite) • For a distribution with excess kurtosis • For a uniform 2 1 1 distribution, 1 2 © Chris Mack, 2016Data to Decisions 3 © Chris Mack, 2016Data to Decisions 4 Sample Kurtosis Sample Kurtosis • For a sample of size n, the sample kurtosis is • An unbiased estimator of the sample excess 1 kurtosis is ∑ ̅ 1 3 3 1 6 1 2 3 ∑ ̅ Standard Error: • For large n, the sampling distribution of 1 24 2 1 approaches Normal with mean 0 and variance 2 1 of 24/n 3 5 • For small samples, this estimator is biased D. N. Joanes and C. A. Gill, “Comparing Measures of Sample Skewness and Kurtosis”, The Statistician, 47(1),183–189 (1998).
    [Show full text]
  • Covariances of Two Sample Rank Sum Statistics
    JOURNAL OF RESEARCH of the National Bureou of Standards - B. Mathematical Sciences Volume 76B, Nos. 1 and 2, January-June 1972 Covariances of Two Sample Rank Sum Statistics Peter V. Tryon Institute for Basic Standards, National Bureau of Standards, Boulder, Colorado 80302 (November 26, 1971) This note presents an elementary derivation of the covariances of the e(e - 1)/2 two-sample rank sum statistics computed among aU pairs of samples from e populations. Key words: e Sample proble m; covariances, Mann-Whitney-Wilcoxon statistics; rank sum statistics; statistics. Mann-Whitney or Wilcoxon rank sum statistics, computed for some or all of the c(c - 1)/2 pairs of samples from c populations, have been used in testing the null hypothesis of homogeneity of dis­ tribution against a variety of alternatives [1, 3,4,5).1 This note presents an elementary derivation of the covariances of such statistics under the null hypothesis_ The usual approach to such an analysis is the rank sum viewpoint of the Wilcoxon form of the statistic. Using this approach, Steel [3] presents a lengthy derivation of the covariances. In this note it is shown that thinking in terms of the Mann-Whitney form of the statistic leads to an elementary derivation. For comparison and completeness the rank sum derivation of Kruskal and Wallis [2] is repeated in obtaining the means and variances. Let x{, r= 1,2, ... , ni, i= 1,2, ... , c, be the rth item in the sample of size ni from the ith of c populations. Let Mij be the Mann-Whitney statistic between the ith andjth samples defined by n· n· Mij= ~ t z[J (1) s=1 1' = 1 where l,xJ~>xr Zr~= { I) O,xj";;;x[ } Thus Mij is the number of times items in the jth sample exceed items in the ith sample.
    [Show full text]
  • A Compendium of Conjugate Priors
    A Compendium of Conjugate Priors Daniel Fink Environmental Statistics Group Department of Biology Montana State Univeristy Bozeman, MT 59717 May 1997 Abstract This report reviews conjugate priors and priors closed under sampling for a variety of data generating processes where the prior distributions are univariate, bivariate, and multivariate. The effects of transformations on conjugate prior relationships are considered and cases where conjugate prior relationships can be applied under transformations are identified. Univariate and bivariate prior relationships are verified using Monte Carlo methods. Contents 1 Introduction Experimenters are often in the position of having had collected some data from which they desire to make inferences about the process that produced that data. Bayes' theorem provides an appealing approach to solving such inference problems. Bayes theorem, π(θ) L(θ x ; : : : ; x ) g(θ x ; : : : ; x ) = j 1 n (1) j 1 n π(θ) L(θ x ; : : : ; x )dθ j 1 n is commonly interpreted in the following wayR. We want to make some sort of inference on the unknown parameter(s), θ, based on our prior knowledge of θ and the data collected, x1; : : : ; xn . Our prior knowledge is encapsulated by the probability distribution on θ; π(θ). The data that has been collected is combined with our prior through the likelihood function, L(θ x ; : : : ; x ) . The j 1 n normalized product of these two components yields a probability distribution of θ conditional on the data. This distribution, g(θ x ; : : : ; x ) , is known as the posterior distribution of θ. Bayes' j 1 n theorem is easily extended to cases where is θ multivariate, a vector of parameters.
    [Show full text]
  • Bayes Estimator Recap - Example
    Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary . Last Lecture . Biostatistics 602 - Statistical Inference Lecture 16 • What is a Bayes Estimator? Evaluation of Bayes Estimator • Is a Bayes Estimator the best unbiased estimator? . • Compared to other estimators, what are advantages of Bayes Estimator? Hyun Min Kang • What is conjugate family? • What are the conjugate families of Binomial, Poisson, and Normal distribution? March 14th, 2013 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 1 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 2 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary . Recap - Bayes Estimator Recap - Example • θ : parameter • π(θ) : prior distribution i.i.d. • X1, , Xn Bernoulli(p) • X θ fX(x θ) : sampling distribution ··· ∼ | ∼ | • π(p) Beta(α, β) • Posterior distribution of θ x ∼ | • α Prior guess : pˆ = α+β . Joint fX(x θ)π(θ) π(θ x) = = | • Posterior distribution : π(p x) Beta( xi + α, n xi + β) | Marginal m(x) | ∼ − • Bayes estimator ∑ ∑ m(x) = f(x θ)π(θ)dθ (Bayes’ rule) | α + x x n α α + β ∫ pˆ = i = i + α + β + n n α + β + n α + β α + β + n • Bayes Estimator of θ is ∑ ∑ E(θ x) = θπ(θ x)dθ | θ Ω | ∫ ∈ Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 3 / 28 Hyun Min Kang Biostatistics 602 - Lecture 16 March 14th, 2013 4 / 28 Recap Bayes Risk Consistency Summary Recap Bayes Risk Consistency Summary . Loss Function Optimality Loss Function Let L(θ, θˆ) be a function of θ and θˆ.
    [Show full text]
  • CSC535: Probabilistic Graphical Models
    CSC535: Probabilistic Graphical Models Bayesian Probability and Statistics Prof. Jason Pacheco Why Graphical Models? Data elements often have dependence arising from structure Pose Estimation Protein Structure Exploit structure to simplify representation and computation Why “Probabilistic”? Stochastic processes have many sources of uncertainty Randomness in Measurement State of Nature Process PGMs let us represent and reason about these in structured ways What is Probability? What does it mean that the probability of heads is ½ ? Two schools of thought… Frequentist Perspective Proportion of successes (heads) in repeated trials (coin tosses) Bayesian Perspective Belief of outcomes based on assumptions about nature and the physics of coin flips Neither is better/worse, but we can compare interpretations… Administrivia • HW1 due 11:59pm tonight • Will accept submissions through Friday, -0.5pts per day late • HW only worth 4pts so maximum score on Friday is 75% • Late policy only applies to this HW Frequentist & Bayesian Modeling We will use the following notation throughout: - Unknown (e.g. coin bias) - Data Frequentist Bayesian (Conditional Model) (Generative Model) Prior Belief Likelihood • is a non-random unknown • is a random variable (latent) parameter • Requires specifying the • is the sampling / data prior belief generating distribution Frequentist Inference Example: Suppose we observe the outcome of N coin flips. What is the probability of heads (coin bias)? • Coin bias is not random (e.g. there is some true value) • Uncertainty reported
    [Show full text]