Bayesian Modelling and Inference on Mixtures of Distributions
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Assessing Fairness with Unlabeled Data and Bayesian Inference
Can I Trust My Fairness Metric? Assessing Fairness with Unlabeled Data and Bayesian Inference Disi Ji1 Padhraic Smyth1 Mark Steyvers2 1Department of Computer Science 2Department of Cognitive Sciences University of California, Irvine [email protected] [email protected] [email protected] Abstract We investigate the problem of reliably assessing group fairness when labeled examples are few but unlabeled examples are plentiful. We propose a general Bayesian framework that can augment labeled data with unlabeled data to produce more accurate and lower-variance estimates compared to methods based on labeled data alone. Our approach estimates calibrated scores for unlabeled examples in each group using a hierarchical latent variable model conditioned on labeled examples. This in turn allows for inference of posterior distributions with associated notions of uncertainty for a variety of group fairness metrics. We demonstrate that our approach leads to significant and consistent reductions in estimation error across multiple well-known fairness datasets, sensitive attributes, and predictive models. The results show the benefits of using both unlabeled data and Bayesian inference in terms of assessing whether a prediction model is fair or not. 1 Introduction Machine learning models are increasingly used to make important decisions about individuals. At the same time it has become apparent that these models are susceptible to producing systematically biased decisions with respect to sensitive attributes such as gender, ethnicity, and age [Angwin et al., 2017, Berk et al., 2018, Corbett-Davies and Goel, 2018, Chen et al., 2019, Beutel et al., 2019]. This has led to a significant amount of recent work in machine learning addressing these issues, including research on both (i) definitions of fairness in a machine learning context (e.g., Dwork et al. -
Bayesian Inference Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Aus´ınand Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 1 / 35 Objective AFM Smith Dennis Lindley We analyze the Bayesian approach to fitting normal and generalized linear models and introduce the Bayesian hierarchical modeling approach. Also, we study the modeling and forecasting of time series. Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 2 / 35 Contents 1 Normal linear models 1.1. ANOVA model 1.2. Simple linear regression model 2 Generalized linear models 3 Hierarchical models 4 Dynamic models Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 3 / 35 Normal linear models A normal linear model is of the following form: y = Xθ + ; 0 where y = (y1;:::; yn) is the observed data, X is a known n × k matrix, called 0 the design matrix, θ = (θ1; : : : ; θk ) is the parameter set and follows a multivariate normal distribution. Usually, it is assumed that: 1 ∼ N 0 ; I : k φ k A simple example of normal linear model is the simple linear regression model T 1 1 ::: 1 where X = and θ = (α; β)T . x1 x2 ::: xn Conchi Aus´ınand Mike Wiper Regression and hierarchical models Masters Programmes 4 / 35 Normal linear models Consider a normal linear model, y = Xθ + . A conjugate prior distribution is a normal-gamma distribution: -
A Tail Quantile Approximation Formula for the Student T and the Symmetric Generalized Hyperbolic Distribution
A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Schlüter, Stephan; Fischer, Matthias J. Working Paper A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution IWQW Discussion Papers, No. 05/2009 Provided in Cooperation with: Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics Suggested Citation: Schlüter, Stephan; Fischer, Matthias J. (2009) : A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution, IWQW Discussion Papers, No. 05/2009, Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung (IWQW), Nürnberg This Version is available at: http://hdl.handle.net/10419/29554 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu IWQW Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung Diskussionspapier Discussion Papers No. -
The Bayesian Lasso
The Bayesian Lasso Trevor Park and George Casella y University of Florida, Gainesville, Florida, USA Summary. The Lasso estimate for linear regression parameters can be interpreted as a Bayesian posterior mode estimate when the priors on the regression parameters are indepen- dent double-exponential (Laplace) distributions. This posterior can also be accessed through a Gibbs sampler using conjugate normal priors for the regression parameters, with indepen- dent exponential hyperpriors on their variances. This leads to tractable full conditional distri- butions through a connection with the inverse Gaussian distribution. Although the Bayesian Lasso does not automatically perform variable selection, it does provide standard errors and Bayesian credible intervals that can guide variable selection. Moreover, the structure of the hierarchical model provides both Bayesian and likelihood methods for selecting the Lasso pa- rameter. The methods described here can also be extended to other Lasso-related estimation methods like bridge regression and robust variants. Keywords: Gibbs sampler, inverse Gaussian, linear regression, empirical Bayes, penalised regression, hierarchical models, scale mixture of normals 1. Introduction The Lasso of Tibshirani (1996) is a method for simultaneous shrinkage and model selection in regression problems. It is most commonly applied to the linear regression model y = µ1n + Xβ + ; where y is the n 1 vector of responses, µ is the overall mean, X is the n p matrix × T × of standardised regressors, β = (β1; : : : ; βp) is the vector of regression coefficients to be estimated, and is the n 1 vector of independent and identically distributed normal errors with mean 0 and unknown× variance σ2. The estimate of µ is taken as the average y¯ of the responses, and the Lasso estimate β minimises the sum of the squared residuals, subject to a given bound t on its L1 norm. -
Mixture Random Effect Model Based Meta-Analysis for Medical Data
Mixture Random Effect Model Based Meta-analysis For Medical Data Mining Yinglong Xia?, Shifeng Weng?, Changshui Zhang??, and Shao Li State Key Laboratory of Intelligent Technology and Systems,Department of Automation, Tsinghua University, Beijing, China [email protected], [email protected], fzcs,[email protected] Abstract. As a powerful tool for summarizing the distributed medi- cal information, Meta-analysis has played an important role in medical research in the past decades. In this paper, a more general statistical model for meta-analysis is proposed to integrate heterogeneous medi- cal researches efficiently. The novel model, named mixture random effect model (MREM), is constructed by Gaussian Mixture Model (GMM) and unifies the existing fixed effect model and random effect model. The pa- rameters of the proposed model are estimated by Markov Chain Monte Carlo (MCMC) method. Not only can MREM discover underlying struc- ture and intrinsic heterogeneity of meta datasets, but also can imply reasonable subgroup division. These merits embody the significance of our methods for heterogeneity assessment. Both simulation results and experiments on real medical datasets demonstrate the performance of the proposed model. 1 Introduction As the great improvement of experimental technologies, the growth of the vol- ume of scientific data relevant to medical experiment researches is getting more and more massively. However, often the results spreading over journals and on- line database appear inconsistent or even contradict because of variance of the studies. It makes the evaluation of those studies to be difficult. Meta-analysis is statistical technique for assembling to integrate the findings of a large collection of analysis results from individual studies. -
A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess
S S symmetry Article A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess Guillermo Martínez-Flórez 1, Víctor Leiva 2,* , Emilio Gómez-Déniz 3 and Carolina Marchant 4 1 Departamento de Matemáticas y Estadística, Facultad de Ciencias Básicas, Universidad de Córdoba, Montería 14014, Colombia; [email protected] 2 Escuela de Ingeniería Industrial, Pontificia Universidad Católica de Valparaíso, 2362807 Valparaíso, Chile 3 Facultad de Economía, Empresa y Turismo, Universidad de Las Palmas de Gran Canaria and TIDES Institute, 35001 Canarias, Spain; [email protected] 4 Facultad de Ciencias Básicas, Universidad Católica del Maule, 3466706 Talca, Chile; [email protected] * Correspondence: [email protected] or [email protected] Received: 30 June 2020; Accepted: 19 August 2020; Published: 1 September 2020 Abstract: In this paper, we consider skew-normal distributions for constructing new a distribution which allows us to model proportions and rates with zero/one inflation as an alternative to the inflated beta distributions. The new distribution is a mixture between a Bernoulli distribution for explaining the zero/one excess and a censored skew-normal distribution for the continuous variable. The maximum likelihood method is used for parameter estimation. Observed and expected Fisher information matrices are derived to conduct likelihood-based inference in this new type skew-normal distribution. Given the flexibility of the new distributions, we are able to show, in real data scenarios, the good performance of our proposal. Keywords: beta distribution; centered skew-normal distribution; maximum-likelihood methods; Monte Carlo simulations; proportions; R software; rates; zero/one inflated data 1. -
Posterior Propriety and Admissibility of Hyperpriors in Normal
The Annals of Statistics 2005, Vol. 33, No. 2, 606–646 DOI: 10.1214/009053605000000075 c Institute of Mathematical Statistics, 2005 POSTERIOR PROPRIETY AND ADMISSIBILITY OF HYPERPRIORS IN NORMAL HIERARCHICAL MODELS1 By James O. Berger, William Strawderman and Dejun Tang Duke University and SAMSI, Rutgers University and Novartis Pharmaceuticals Hierarchical modeling is wonderful and here to stay, but hyper- parameter priors are often chosen in a casual fashion. Unfortunately, as the number of hyperparameters grows, the effects of casual choices can multiply, leading to considerably inferior performance. As an ex- treme, but not uncommon, example use of the wrong hyperparameter priors can even lead to impropriety of the posterior. For exchangeable hierarchical multivariate normal models, we first determine when a standard class of hierarchical priors results in proper or improper posteriors. We next determine which elements of this class lead to admissible estimators of the mean under quadratic loss; such considerations provide one useful guideline for choice among hierarchical priors. Finally, computational issues with the resulting posterior distributions are addressed. 1. Introduction. 1.1. The model and the problems. Consider the block multivariate nor- mal situation (sometimes called the “matrix of means problem”) specified by the following hierarchical Bayesian model: (1) X ∼ Np(θ, I), θ ∼ Np(B, Σπ), where X1 θ1 X2 θ2 Xp×1 = . , θp×1 = . , arXiv:math/0505605v1 [math.ST] 27 May 2005 . X θ m m Received February 2004; revised July 2004. 1Supported by NSF Grants DMS-98-02261 and DMS-01-03265. AMS 2000 subject classifications. Primary 62C15; secondary 62F15. Key words and phrases. Covariance matrix, quadratic loss, frequentist risk, posterior impropriety, objective priors, Markov chain Monte Carlo. -
D. Normal Mixture Models and Elliptical Models
D. Normal Mixture Models and Elliptical Models 1. Normal Variance Mixtures 2. Normal Mean-Variance Mixtures 3. Spherical Distributions 4. Elliptical Distributions QRM 2010 74 D1. Multivariate Normal Mixture Distributions Pros of Multivariate Normal Distribution • inference is \well known" and estimation is \easy". • distribution is given by µ and Σ. • linear combinations are normal (! VaR and ES calcs easy). • conditional distributions are normal. > • For (X1;X2) ∼ N2(µ; Σ), ρ(X1;X2) = 0 () X1 and X2 are independent: QRM 2010 75 Multivariate Normal Variance Mixtures Cons of Multivariate Normal Distribution • tails are thin, meaning that extreme values are scarce in the normal model. • joint extremes in the multivariate model are also too scarce. • the distribution has a strong form of symmetry, called elliptical symmetry. How to repair the drawbacks of the multivariate normal model? QRM 2010 76 Multivariate Normal Variance Mixtures The random vector X has a (multivariate) normal variance mixture distribution if d p X = µ + WAZ; (1) where • Z ∼ Nk(0;Ik); • W ≥ 0 is a scalar random variable which is independent of Z; and • A 2 Rd×k and µ 2 Rd are a matrix and a vector of constants, respectively. > Set Σ := AA . Observe: XjW = w ∼ Nd(µ; wΣ). QRM 2010 77 Multivariate Normal Variance Mixtures Assumption: rank(A)= d ≤ k, so Σ is a positive definite matrix. If E(W ) < 1 then easy calculations give E(X) = µ and cov(X) = E(W )Σ: We call µ the location vector or mean vector and we call Σ the dispersion matrix. The correlation matrices of X and AZ are identical: corr(X) = corr(AZ): Multivariate normal variance mixtures provide the most useful examples of elliptical distributions. -
Mixture Modeling with Applications in Alzheimer's Disease
University of Kentucky UKnowledge Theses and Dissertations--Epidemiology and Biostatistics College of Public Health 2017 MIXTURE MODELING WITH APPLICATIONS IN ALZHEIMER'S DISEASE Frank Appiah University of Kentucky, [email protected] Digital Object Identifier: https://doi.org/10.13023/ETD.2017.100 Right click to open a feedback form in a new tab to let us know how this document benefits ou.y Recommended Citation Appiah, Frank, "MIXTURE MODELING WITH APPLICATIONS IN ALZHEIMER'S DISEASE" (2017). Theses and Dissertations--Epidemiology and Biostatistics. 14. https://uknowledge.uky.edu/epb_etds/14 This Doctoral Dissertation is brought to you for free and open access by the College of Public Health at UKnowledge. It has been accepted for inclusion in Theses and Dissertations--Epidemiology and Biostatistics by an authorized administrator of UKnowledge. For more information, please contact [email protected]. STUDENT AGREEMENT: I represent that my thesis or dissertation and abstract are my original work. Proper attribution has been given to all outside sources. I understand that I am solely responsible for obtaining any needed copyright permissions. I have obtained needed written permission statement(s) from the owner(s) of each third-party copyrighted matter to be included in my work, allowing electronic distribution (if such use is not permitted by the fair use doctrine) which will be submitted to UKnowledge as Additional File. I hereby grant to The University of Kentucky and its agents the irrevocable, non-exclusive, and royalty-free license to archive and make accessible my work in whole or in part in all forms of media, now or hereafter known. -
Empirical Bayes Methods for Combining Likelihoods Bradley EFRON
Empirical Bayes Methods for Combining Likelihoods Bradley EFRON Supposethat several independent experiments are observed,each one yieldinga likelihoodLk (0k) for a real-valuedparameter of interestOk. For example,Ok might be the log-oddsratio for a 2 x 2 table relatingto the kth populationin a series of medical experiments.This articleconcerns the followingempirical Bayes question:How can we combineall of the likelihoodsLk to get an intervalestimate for any one of the Ok'S, say 01? The resultsare presented in the formof a realisticcomputational scheme that allows model buildingand model checkingin the spiritof a regressionanalysis. No specialmathematical forms are requiredfor the priorsor the likelihoods.This schemeis designedto take advantageof recentmethods that produceapproximate numerical likelihoodsLk(6k) even in very complicatedsituations, with all nuisanceparameters eliminated. The empiricalBayes likelihood theoryis extendedto situationswhere the Ok'S have a regressionstructure as well as an empiricalBayes relationship.Most of the discussionis presentedin termsof a hierarchicalBayes model and concernshow such a model can be implementedwithout requiringlarge amountsof Bayesianinput. Frequentist approaches, such as bias correctionand robustness,play a centralrole in the methodology. KEY WORDS: ABC method;Confidence expectation; Generalized linear mixed models;Hierarchical Bayes; Meta-analysis for likelihoods;Relevance; Special exponential families. 1. INTRODUCTION for 0k, also appearing in Table 1, is A typical statistical analysis blends data from indepen- dent experimental units into a single combined inference for a parameter of interest 0. Empirical Bayes, or hierar- SDk ={ ak +.S + bb + .5 +k + -5 dk + .5 chical or meta-analytic analyses, involve a second level of SD13 = .61, for example. data acquisition. Several independent experiments are ob- The statistic 0k is an estimate of the true log-odds ratio served, each involving many units, but each perhaps having 0k in the kth experimental population, a different value of the parameter 0. -
A Two-Component Normal Mixture Alternative to the Fay-Herriot Model
A two-component normal mixture alternative to the Fay-Herriot model August 22, 2015 Adrijo Chakraborty 1, Gauri Sankar Datta 2 3 4 and Abhyuday Mandal 5 Abstract: This article considers a robust hierarchical Bayesian approach to deal with ran- dom effects of small area means when some of these effects assume extreme values, resulting in outliers. In presence of outliers, the standard Fay-Herriot model, used for modeling area- level data, under normality assumptions of the random effects may overestimate random effects variance, thus provides less than ideal shrinkage towards the synthetic regression pre- dictions and inhibits borrowing information. Even a small number of substantive outliers of random effects result in a large estimate of the random effects variance in the Fay-Herriot model, thereby achieving little shrinkage to the synthetic part of the model or little reduction in posterior variance associated with the regular Bayes estimator for any of the small areas. While a scale mixture of normal distributions with known mixing distribution for the random effects has been found to be effective in presence of outliers, the solution depends on the mixing distribution. As a possible alternative solution to the problem, a two-component nor- mal mixture model has been proposed based on noninformative priors on the model variance parameters, regression coefficients and the mixing probability. Data analysis and simulation studies based on real, simulated and synthetic data show advantage of the proposed method over the standard Bayesian Fay-Herriot solution derived under normality of random effects. 1NORC at the University of Chicago, Bethesda, MD 20814, [email protected] 2Department of Statistics, University of Georgia, Athens, GA 30602, USA. -
A New Hyperprior Distribution for Bayesian Regression Model with Application in Genomics
bioRxiv preprint doi: https://doi.org/10.1101/102244; this version posted January 22, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. A New Hyperprior Distribution for Bayesian Regression Model with Application in Genomics Renato Rodrigues Silva 1,* 1 Institute of Mathematics and Statistics, Federal University of Goi´as, Goi^ania,Goi´as,Brazil *[email protected] Abstract In the regression analysis, there are situations where the model have more predictor variables than observations of dependent variable, resulting in the problem known as \large p small n". In the last fifteen years, this problem has been received a lot of attention, specially in the genome-wide context. Here we purposed the bayes H model, a bayesian regression model using mixture of two scaled inverse chi square as hyperprior distribution of variance for each regression coefficient. This model is implemented in the R package BayesH. Introduction 1 In the regression analysis, there are situations where the model have more predictor 2 variables than observations of dependent variable, resulting in the problem known as 3 \large p small n" [1]. 4 To figure out this problem, there are already exists some methods developed as ridge 5 regression [2], least absolute shrinkage and selection operator (LASSO) regression [3], 6 bridge regression [4], smoothly clipped absolute deviation (SCAD) regression [5] and 7 others. This class of regression models is known in the literature as regression model 8 with penalized likelihood [6]. In the bayesian paradigma, there are also some methods 9 purposed as stochastic search variable selection [7], and Bayesian LASSO [8].