Bayesian Analysis 1 Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Exponential Family 1 Definition
The Exponential Family David M. Blei Columbia University November 9, 2016 The exponential family is a class of densities (Brown, 1986). It encompasses many familiar forms of likelihoods, such as the Gaussian, Poisson, multinomial, and Bernoulli. It also encompasses their conjugate priors, such as the Gamma, Dirichlet, and beta. 1 Definition A probability density in the exponential family has this form p.x / h.x/ exp >t.x/ a./ ; (1) j D f g where is the natural parameter; t.x/ are sufficient statistics; h.x/ is the “base measure;” a./ is the log normalizer. Examples of exponential family distributions include Gaussian, gamma, Poisson, Bernoulli, multinomial, Markov models. Examples of distributions that are not in this family include student-t, mixtures, and hidden Markov models. (We are considering these families as distributions of data. The latent variables are implicitly marginalized out.) The statistic t.x/ is called sufficient because the probability as a function of only depends on x through t.x/. The exponential family has fundamental connections to the world of graphical models (Wainwright and Jordan, 2008). For our purposes, we’ll use exponential 1 families as components in directed graphical models, e.g., in the mixtures of Gaussians. The log normalizer ensures that the density integrates to 1, Z a./ log h.x/ exp >t.x/ d.x/ (2) D f g This is the negative logarithm of the normalizing constant. The function h.x/ can be a source of confusion. One way to interpret h.x/ is the (unnormalized) distribution of x when 0. It might involve statistics of x that D are not in t.x/, i.e., that do not vary with the natural parameter. -
A Skew Extension of the T-Distribution, with Applications
J. R. Statist. Soc. B (2003) 65, Part 1, pp. 159–174 A skew extension of the t-distribution, with applications M. C. Jones The Open University, Milton Keynes, UK and M. J. Faddy University of Birmingham, UK [Received March 2000. Final revision July 2002] Summary. A tractable skew t-distribution on the real line is proposed.This includes as a special case the symmetric t-distribution, and otherwise provides skew extensions thereof.The distribu- tion is potentially useful both for modelling data and in robustness studies. Properties of the new distribution are presented. Likelihood inference for the parameters of this skew t-distribution is developed. Application is made to two data modelling examples. Keywords: Beta distribution; Likelihood inference; Robustness; Skewness; Student’s t-distribution 1. Introduction Student’s t-distribution occurs frequently in statistics. Its usual derivation and use is as the sam- pling distribution of certain test statistics under normality, but increasingly the t-distribution is being used in both frequentist and Bayesian statistics as a heavy-tailed alternative to the nor- mal distribution when robustness to possible outliers is a concern. See Lange et al. (1989) and Gelman et al. (1995) and references therein. It will often be useful to consider a further alternative to the normal or t-distribution which is both heavy tailed and skew. To this end, we propose a family of distributions which includes the symmetric t-distributions as special cases, and also includes extensions of the t-distribution, still taking values on the whole real line, with non-zero skewness. Let a>0 and b>0be parameters. -
On a Problem Connected with Beta and Gamma Distributions by R
ON A PROBLEM CONNECTED WITH BETA AND GAMMA DISTRIBUTIONS BY R. G. LAHA(i) 1. Introduction. The random variable X is said to have a Gamma distribution G(x;0,a)if du for x > 0, (1.1) P(X = x) = G(x;0,a) = JoT(a)" 0 for x ^ 0, where 0 > 0, a > 0. Let X and Y be two independently and identically distributed random variables each having a Gamma distribution of the form (1.1). Then it is well known [1, pp. 243-244], that the random variable W = X¡iX + Y) has a Beta distribution Biw ; a, a) given by 0 for w = 0, (1.2) PiW^w) = Biw;x,x)=\ ) u"-1il-u)'-1du for0<w<l, Ío T(a)r(a) 1 for w > 1. Now we can state the converse problem as follows : Let X and Y be two independently and identically distributed random variables having a common distribution function Fix). Suppose that W = Xj{X + Y) has a Beta distribution of the form (1.2). Then the question is whether £(x) is necessarily a Gamma distribution of the form (1.1). This problem was posed by Mauldon in [9]. He also showed that the converse problem is not true in general and constructed an example of a non-Gamma distribution with this property using the solution of an integral equation which was studied by Goodspeed in [2]. In the present paper we carry out a systematic investigation of this problem. In §2, we derive some general properties possessed by this class of distribution laws Fix). -
Statistical Modelling
Statistical Modelling Dave Woods and Antony Overstall (Chapters 1–2 closely based on original notes by Anthony Davison and Jon Forster) c 2016 StatisticalModelling ............................... ......................... 0 1. Model Selection 1 Overview............................................ .................. 2 Basic Ideas 3 Whymodel?.......................................... .................. 4 Criteriaformodelselection............................ ...................... 5 Motivation......................................... .................... 6 Setting ............................................ ................... 9 Logisticregression................................... .................... 10 Nodalinvolvement................................... .................... 11 Loglikelihood...................................... .................... 14 Wrongmodel.......................................... ................ 15 Out-of-sampleprediction ............................. ..................... 17 Informationcriteria................................... ................... 18 Nodalinvolvement................................... .................... 20 Theoreticalaspects.................................. .................... 21 PropertiesofAIC,NIC,BIC............................... ................. 22 Linear Model 23 Variableselection ................................... .................... 24 Stepwisemethods.................................... ................... 25 Nuclearpowerstationdata............................ .................... -
Introduction to Bayesian Estimation
Introduction to Bayesian Estimation March 2, 2016 The Plan 1. What is Bayesian statistics? How is it different from frequentist methods? 2. 4 Bayesian Principles 3. Overview of main concepts in Bayesian analysis Main reading: Ch.1 in Gary Koop's Bayesian Econometrics Additional material: Christian P. Robert's The Bayesian Choice: From decision theoretic foundations to Computational Implementation Gelman, Carlin, Stern, Dunson, Wehtari and Rubin's Bayesian Data Analysis Probability and statistics; What's the difference? Probability is a branch of mathematics I There is little disagreement about whether the theorems follow from the axioms Statistics is an inversion problem: What is a good probabilistic description of the world, given the observed outcomes? I There is some disagreement about how we interpret data/observations and how we make inference about unobservable parameters Why probabilistic models? Is the world characterized by randomness? I Is the weather random? I Is a coin flip random? I ECB interest rates? It is difficult to say with certainty whether something is \truly" random. Two schools of statistics What is the meaning of probability, randomness and uncertainty? Two main schools of thought: I The classical (or frequentist) view is that probability corresponds to the frequency of occurrence in repeated experiments I The Bayesian view is that probabilities are statements about our state of knowledge, i.e. a subjective view. The difference has implications for how we interpret estimated statistical models and there is no general -
Hybrid of Naive Bayes and Gaussian Naive Bayes for Classification: a Map Reduce Approach
International Journal of Innovative Technology and Exploring Engineering (IJITEE) ISSN: 2278-3075, Volume-8, Issue-6S3, April 2019 Hybrid of Naive Bayes and Gaussian Naive Bayes for Classification: A Map Reduce Approach Shikha Agarwal, Balmukumd Jha, Tisu Kumar, Manish Kumar, Prabhat Ranjan Abstract: Naive Bayes classifier is well known machine model could be used without accepting Bayesian probability learning algorithm which has shown virtues in many fields. In or using any Bayesian methods. Despite their naive design this work big data analysis platforms like Hadoop distributed and simple assumptions, Naive Bayes classifiers have computing and map reduce programming is used with Naive worked quite well in many complex situations. An analysis Bayes and Gaussian Naive Bayes for classification. Naive Bayes is manily popular for classification of discrete data sets while of the Bayesian classification problem showed that there are Gaussian is used to classify data that has continuous attributes. sound theoretical reasons behind the apparently implausible Experimental results show that Hybrid of Naive Bayes and efficacy of types of classifiers[5]. A comprehensive Gaussian Naive Bayes MapReduce model shows the better comparison with other classification algorithms in 2006 performance in terms of classification accuracy on adult data set showed that Bayes classification is outperformed by other which has many continuous attributes. approaches, such as boosted trees or random forests [6]. An Index Terms: Naive Bayes, Gaussian Naive Bayes, Map Reduce, Classification. advantage of Naive Bayes is that it only requires a small number of training data to estimate the parameters necessary I. INTRODUCTION for classification. The fundamental property of Naive Bayes is that it works on discrete value. -
Adjusted Probability Naive Bayesian Induction
Adjusted Probability Naive Bayesian Induction 1 2 Geo rey I. Webb and Michael J. Pazzani 1 Scho ol of Computing and Mathematics Deakin University Geelong, Vic, 3217, Australia. 2 Department of Information and Computer Science University of California, Irvine Irvine, Ca, 92717, USA. Abstract. NaiveBayesian classi ers utilise a simple mathematical mo del for induction. While it is known that the assumptions on which this mo del is based are frequently violated, the predictive accuracy obtained in discriminate classi cation tasks is surprisingly comp etitive in compar- ison to more complex induction techniques. Adjusted probability naive Bayesian induction adds a simple extension to the naiveBayesian clas- si er. A numeric weight is inferred for each class. During discriminate classi cation, the naiveBayesian probability of a class is multiplied by its weight to obtain an adjusted value. The use of this adjusted value in place of the naiveBayesian probabilityisshown to signi cantly improve predictive accuracy. 1 Intro duction The naiveBayesian classi er Duda & Hart, 1973 provides a simple approachto discriminate classi cation learning that has demonstrated comp etitive predictive accuracy on a range of learning tasks Clark & Niblett, 1989; Langley,P., Iba, W., & Thompson, 1992. The naiveBayesian classi er is also attractive as it has an explicit and sound theoretical basis which guarantees optimal induction given a set of explicit assumptions. There is a drawback, however, in that it is known that some of these assumptions will b e violated in many induction scenarios. In particular, one key assumption that is frequently violated is that the attributes are indep endent with resp ect to the class variable. -
A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess
S S symmetry Article A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess Guillermo Martínez-Flórez 1, Víctor Leiva 2,* , Emilio Gómez-Déniz 3 and Carolina Marchant 4 1 Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 14014, Colombia; [email protected] 2 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, 2362807 Valparaíso, Chile 3 Facultad de Economía, Empresa y Turismo, Universidad de Las Palmas de Gran Canaria and TIDES Institute, 35001 Canarias, Spain; [email protected] 4 Facultad de Ciencias Básicas, Universidad Católica del Maule, 3466706 Talca, Chile; [email protected] * Correspondence: [email protected] or [email protected] Received: 30 June 2020; Accepted: 19 August 2020; Published: 1 September 2020 Abstract: In this paper, we consider skew-normal distributions for constructing new a distribution which allows us to model proportions and rates with zero/one inflation as an alternative to the inflated beta distributions. The new distribution is a mixture between a Bernoulli distribution for explaining the zero/one excess and a censored skew-normal distribution for the continuous variable. The maximum likelihood method is used for parameter estimation. Observed and expected Fisher information matrices are derived to conduct likelihood-based inference in this new type skew-normal distribution. Given the flexibility of the new distributions, we are able to show, in real data scenarios, the good performance of our proposal. Keywords: beta distribution; centered skew-normal distribution; maximum-likelihood methods; Monte Carlo simulations; proportions; R software; rates; zero/one inflated data 1. -
Basics of Bayesian Econometrics
Basics of Bayesian Econometrics Notes for Summer School Moscow State University, Faculty of Economics Andrey Simonov1 June 2013 0 c Andrey D. Simonov, 2013 1University of Chicago, Booth School of Business. All errors and typos are of my own. Please report these as well as any other questions to [email protected]. Contents 1 Introduction 3 1.1 Frequentists vs Bayesians . .3 1.2 Road Map . .5 1.3 Attributions and Literature . .5 1.4 Things out of scope . .5 2 Foundations 6 2.1 Bayes' Theorem and its Ingredients . .6 2.1.1 Bayes' Theorem . .6 2.1.2 Predictive Distribution . .7 2.1.3 Point estimator . .7 2.2 Two Fundamental Principles . .7 2.2.1 Sufficiency Principle . .8 2.2.2 Likelihood Principle . .8 2.3 Conjugate distributions . .9 2.3.1 Example: Normal and Normal . .9 2.3.2 Example: Binomial and Beta . 11 2.3.3 Asymptotic Equivalence . 11 2.4 Bayesian Regressions . 12 2.4.1 Multiple Regression . 12 2.4.2 Multivariate Regression . 13 2.4.3 Seemingly Unrelated Regression . 14 3 MCMC Methods 15 3.1 Monte Carlo simulation . 15 3.1.1 Importance sampling . 16 3.1.2 Methods of simulations: frequently used distributions . 16 3.2 Basic Markov Chain Theory . 17 3.3 Monte Carlo Markov Chain . 18 3.4 Methods and Algorithms . 19 3.4.1 Gibbs Sampler . 19 3.4.2 Gibbs and SUR . 20 3.4.3 Data Augmentation . 21 3.4.4 Hierarchical Models . 22 3.4.5 Mixture of Normals . 22 3.4.6 Metropolis-Hastings Algorithm . -
Bayesian Macroeconometrics
Bayesian Macroeconometrics Marco Del Negro Frank Schorfheide⇤ Federal Reserve Bank of New York University of Pennsylvania CEPR and NBER April 18, 2010 Prepared for Handbook of Bayesian Econometrics ⇤Correspondence: Marco Del Negro: Research Department, Federal Reserve Bank of New York, 33 Liberty Street, New York NY 10045: [email protected]. Frank Schorfheide: Depart- ment of Economics, 3718 Locust Walk, University of Pennsylvania, Philadelphia, PA 19104-6297. Email: [email protected]. The views expressed in this chapter do not necessarily reflect those of the Federal Reserve Bank of New York or the Federal Reserve System. Ed Herbst and Maxym Kryshko provided excellent research assistant. We are thankful for the feedback received from the editors of the Handbook John Geweke, Gary Koop, and Herman van Dijk as well as comments by Giorgio Primiceri, Dan Waggoner, and Tao Zha. Del Negro, Schorfheide – Bayesian Macroeconometrics: April 18, 2010 2 Contents 1 Introduction 1 1.1 Challenges for Inference and Decision Making . 1 1.2 How Can Bayesian Analysis Help? . 2 1.3 Outline of this Chapter . 4 2 Vector Autoregressions 7 2.1 A Reduced-Form VAR . 8 2.2 Dummy Observations and the Minnesota Prior . 10 2.3 A Second Reduced-Form VAR . 14 2.4 Structural VARs . 16 2.5 Further VAR Topics . 29 3 VARs with Reduced-Rank Restrictions 29 3.1 Cointegration Restrictions . 30 3.2 Bayesian Inference with Gaussian Prior for β ............. 32 3.3 Further Research on Bayesian Cointegration Models . 35 4 Dynamic Stochastic General Equilibrium Models 38 4.1 A Prototypical DSGE Model . 39 4.2 Model Solution and State-Space Form . -
Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families
STATS 300A: Theory of Statistics Fall 2015 Lecture 2 | September 24 Lecturer: Lester Mackey Scribe: Stephen Bates and Andy Tsao 2.1 Recap Last time, we set out on a quest to develop optimal inference procedures and, along the way, encountered an important pair of assertions: not all data is relevant, and irrelevant data can only increase risk and hence impair performance. This led us to introduce a notion of lossless data compression (sufficiency): T is sufficient for P with X ∼ Pθ 2 P if X j T (X) is independent of θ. How far can we take this idea? At what point does compression impair performance? These are questions of optimal data reduction. While we will develop general answers to these questions in this lecture and the next, we can often say much more in the context of specific modeling choices. With this in mind, let's consider an especially important class of models known as the exponential family models. 2.2 Exponential Families Definition 1. The model fPθ : θ 2 Ωg forms an s-dimensional exponential family if each Pθ has density of the form: s ! X p(x; θ) = exp ηi(θ)Ti(x) − B(θ) h(x) i=1 • ηi(θ) 2 R are called the natural parameters. • Ti(x) 2 R are its sufficient statistics, which follows from NFFC. • B(θ) is the log-partition function because it is the logarithm of a normalization factor: s ! ! Z X B(θ) = log exp ηi(θ)Ti(x) h(x)dµ(x) 2 R i=1 • h(x) 2 R: base measure. -
Faculty of Economics University of Tor Vergata Rome Bayesian Methods in Microeconometrics May 15 − 17Th 2019
Faculty of Economics University of Tor Vergata Rome Bayesian Methods in Microeconometrics May 15 − 17th 2019 Lecturer: M. J. Weeks Assistant Professor of Economics, University of Cambridge This course will provide an introduction to simulation-based methods that are commonly used in microeconometrics. The emphasis will be Bayesian, although we will also contrast posterior analysis with maximum likelihood estimation. The course covers a mix of theory and applications. We will start with a number of theoretical issues including exchangeability, prior-posterior anal- ysis and hypothesis testing. We will also examine the fundamental problem of prior elicitation. We present the basic ideas behind Monte Carlo integration and demonstrate the use of these techniques in both maximising a likelihood and sampling from a posterior distribution. We will examine the fundamentals of mcmc from both a theoretical and applied setting. We start with some theory, focussing on the integral transform theorem and demonstrating how this provides a route to simulating random variables. We provide a brief introduction to simple Accept-Reject sampling methods, moving onto the Gibbs sampler and Metropolis-Hastings. Applications will be taken from discrete choice, treatment models, endo- geneity and consumer heterogeneity in marketing applications. In each case we consider the particular advantages of a Bayesian approach, how to im- plement, and how data augmentation can be a useful tool in designing a posterior simulator. We will also provide a short introduction to machine learning with a focus on decision trees for prediction and causal effects. Given time constraints, we will provide examples using Python and R. 1 Topics: The course will select from the following topics: 1.