BMA: an R Package for Bayesian Model Averaging

Total Page:16

File Type:pdf, Size:1020Kb

BMA: an R Package for Bayesian Model Averaging News The Newsletter of the R Project Volume 5/2, November 2005 Editorial by Douglas Bates Early on they mention the task view for spatial sta- tistical analysis that is being maintained on CRAN. One of the strengths of the R Project is its pack- Task views are a new feature on CRAN designed to age system and its network of archives of pack- make it easier for those who are interested in a par- ages. All useRs will be (or at least should be) famil- ticular topic to navigate their way through the mul- iar with CRAN, the Comprehensive R Archive Net- titude of packages that are available and to find the work, which provides access to more than 600 con- ones targeted to their particular interests. tributed packages. This wealth of contributed soft- To provide the exception that proves the rule, Xie ware is a tribute to the useRs and developeRs who describes the use of condor, a batch process schedul- have written and contributed those packages and ing system, and its DAG (directed acylic graph) capa- to the CRAN maintainers who have devoted untold bility to coordinate remote execution of long-running hours of effort to establishing, maintaining and con- R processes. This article is an exception in that tinuously improving CRAN. it doesn’t deal with an R package but it certainly Kurt Hornik and Fritz Leisch created CRAN and should be of interest to anyone using R for running continue to do the lion’s share of the work in main- simulations or for generating MCMC samples, etc. taining and improving it. They were also the found- and for anyone who works with others who have ing editors of R News. Part of their vision for R News such applications. I can tell you from experience that was to provide a forum in which to highlight some the combination of condor and R, as described by of the packages available on CRAN, to explain their Xie, can help enormously in maintaining civility in purpose and to provide an illustration of their use. a large and active group of statistics researchers who This issue of R News brings that vision to fruition share access to computers. in that almost all our articles are written about CRAN Continuing in the general area of distributed packages by the authors of those packages. statistical computing, L’Ecuyer and Leydold de- We begin with an article by Raftery, Painter and scribe their rstream package for maintaining mul- Volinsky about the BMA package that uses Bayesian tiple reproducible streams of pseudo-random num- Model Averaging to deal with uncertainty in model bers, possibly across simulations that are being run selection. in parallel. Next Pebesma and Bivand discuss the sp pack- This is followed by Benner’s description of his age in the context of providing the underlying classes mfp package for multivariable fractional polynomi- and methods for working with spatial data in R. als and Sailer’s description of his crossdes package Contents of this issue: rstream: Streams of Random Numbers for Stochastic Simulation . 16 mfp . 20 Editorial . 1 crossdes . 24 BMA: An R package for Bayesian Model Aver- R Help Desk . 27 aging . 2 Changes in R . 29 Classes and methods for spatial data in R ... 9 Changes on CRAN . 35 Running Long R Jobs with Condor DAG . 13 Forthcoming Events: useR! 2006 . 43 Vol. 5/2, November 2005 2 for design and randomization in crossover studies. board and I would like to take this opportunity to ex- We round out the issue with regular features that tend my heartfelt thanks to Paul Murrell and Torsten include the R Help Desk, in which Duncan Mur- Hothorn, my associate editors. They have again doch joins Uwe Ligges to explain the arcane art of stepped up and shouldered the majority of the work building R packages under Windows, descriptions of of preparing this issue while I was distracted by changes in the 2.2.0 release of R and recent changes other concerns. Their attitude exemplifies the spirit on CRAN and an announcement of an upcoming of volunteerism and helpfulness that makes it so re- event, useR!2006. Be sure to read all the way through warding (and so much fun) to work on the R Project. to the useR! announcement. If this meeting is any- thing like the previous useR! meetings – and it will Douglas Bates be – it will be great! Please do consider attending. University of Wisconsin – Madison, U.S.A. This is my last issue on the R News editorial [email protected] BMA: An R package for Bayesian Model Averaging by Adrian E. Raftery, Ian S. Painter and Christopher T. does BMA for linear regression using Markov chain Volinsky Monte Carlo model composition (MC3), and allows one to specify a prior distribution and to make in- Bayesian model averaging (BMA) is a way of taking ference about the variables to be included and about account of uncertainty about model form or assump- possible outliers at the same time. tions and propagating it through to inferences about an unknown quantity of interest such as a population parameter, a future observation, or the future payoff Basic ideas or cost of a course of action. The BMA posterior dis- tribution of the quantity of interest is a weighted av- Suppose we want to make inference about an un- erage of its posterior distributions under each of the known quantity of interest ∆, and we have data D. models considered, where a model’s weight is equal We are considering several possible statistical mod- to the posterior probability that it is correct, given els for doing this, M1,..., MK. The number of mod- that one of the models considered is correct. els could be quite large. For example, if we consider Model uncertainty can be large when observa- only regression models but are unsure about which tional data are modeled using regression, or its ex- of p possible predictors to include, there could be as tensions such as generalized linear models or sur- many as 2p models considered. In sociology and epi- vival (or event history) analysis. There are often demiology, for example, values of p on the order of many modeling choices that are secondary to the 30 are not uncommon, and this could correspond to main questions of interest but can still have an im- around 230, or one billion models. portant effect on conclusions. These can include Bayesian statistics expresses all uncertainties in which potential confounding variables to control for, terms of probability, and make all inferences by ap- how to transform or recode variables whose effects plying the basic rules of probability calculus. BMA is may be nonlinear, and which data points to identify no more than basic Bayesian statistics in the presence as outliers and exclude. Each combination of choices of model uncertainty. By a simple application of the represents a statistical model, and the number of pos- law of total probability, the BMA posterior distribu- sible models can be enormous. tion of ∆ is The R package BMA provides ways of carrying K out BMA for linear regression, generalized linear p(∆|D) = ∑ p(∆|D, Mk)p(Mk|D), (1) models, and survival or event history analysis us- k=1 ing Cox proportional hazards models. The func- tions bicreg, bic.glm and bic.surv, account for where p(∆|D, Mk) is the posterior distribution of ∆ uncertainty about the variables to be included in given the model Mk, and p(Mk|D) is the posterior the model, using the simple BIC (Bayesian Informa- probability that Mk is the correct model, given that tion Criterion) approximation to the posterior model one of the models considered is correct. Thus the probabilities. They do an exhaustive search over BMA posterior distribution of ∆ is a weighted aver- the model space using the fast leaps and bounds al- age of the posterior distributions of ∆ under each of gorithm. The function glib allows one to specify the models, weighted by their posterior model prob- one’s own prior distribution. The function MC3.REG abilities. In (1), p(∆|D) and p(∆|D, Mk) can be prob- R News ISSN 1609-3631 Vol. 5/2, November 2005 3 ability density functions, probability mass functions, Computation or cumulative distribution functions. The posterior model probability of Mk is given by Implementing BMA involves two hard computa- tional problems: computing the integrated likeli- p(D|M )p(M ) hoods for all the models (3), and averaging over all ( | ) = k k p Mk D K . (2) the models, whose number can be huge, as in (1) ∑ p(D|M`)p(M`) `=1 and (6). In the functions bicreg (for variable selec- tion in linear regression), bic.glm (for variable selec- In equation (2), p(D|Mk) is the integrated likelihood of tion in generalized linear models), and bic.surv (for model Mk, obtained by integrating (not maximizing) over the unknown parameters: variable selection in survival or event history analy- sis), the integrated likelihood is approximated by the Z BIC approximation (4). The sum over all the mod- p(D|Mk) = p(D|θk, Mk)p(θk|Mk)dθk (3) els is approximated by finding the best models us- Z ing the fast leaps and bounds algorithm. The leaps = ( likelihood × prior )dθk, and bounds algorithm was introduced for all sub- sets regression by Furnival and Wilson (1974), and where θk is the parameter of model Mk and extended to BMA for linear regression and general- p(D|θk, Mk) is the likelihood of θk under model Mk. ized linear models by Raftery (1995), and to BMA for The prior model probabilities are often taken to be survival analysis by Volinsky et al.
Recommended publications
  • Tinn-R Team Has a New Member Working on the Source Code: Wel- Come Huashan Chen
    Editus eBook Series Editus eBooks is a series of electronic books aimed at students and re- searchers of arts and sciences in general. Tinn-R Editor (2010 1. ed. Rmetrics) Tinn-R Editor - GUI forR Language and Environment (2014 2. ed. Editus) José Cláudio Faria Philippe Grosjean Enio Galinkin Jelihovschi Ricardo Pietrobon Philipe Silva Farias Universidade Estadual de Santa Cruz GOVERNO DO ESTADO DA BAHIA JAQUES WAGNER - GOVERNADOR SECRETARIA DE EDUCAÇÃO OSVALDO BARRETO FILHO - SECRETÁRIO UNIVERSIDADE ESTADUAL DE SANTA CRUZ ADÉLIA MARIA CARVALHO DE MELO PINHEIRO - REITORA EVANDRO SENA FREIRE - VICE-REITOR DIRETORA DA EDITUS RITA VIRGINIA ALVES SANTOS ARGOLLO Conselho Editorial: Rita Virginia Alves Santos Argollo – Presidente Andréa de Azevedo Morégula André Luiz Rosa Ribeiro Adriana dos Santos Reis Lemos Dorival de Freitas Evandro Sena Freire Francisco Mendes Costa José Montival Alencar Junior Lurdes Bertol Rocha Maria Laura de Oliveira Gomes Marileide dos Santos de Oliveira Raimunda Alves Moreira de Assis Roseanne Montargil Rocha Silvia Maria Santos Carvalho Copyright©2015 by JOSÉ CLÁUDIO FARIA PHILIPPE GROSJEAN ENIO GALINKIN JELIHOVSCHI RICARDO PIETROBON PHILIPE SILVA FARIAS Direitos desta edição reservados à EDITUS - EDITORA DA UESC A reprodução não autorizada desta publicação, por qualquer meio, seja total ou parcial, constitui violação da Lei nº 9.610/98. Depósito legal na Biblioteca Nacional, conforme Lei nº 10.994, de 14 de dezembro de 2004. CAPA Carolina Sartório Faria REVISÃO Amek Traduções Dados Internacionais de Catalogação na Publicação (CIP) T591 Tinn-R Editor – GUI for R Language and Environment / José Cláudio Faria [et al.]. – 2. ed. – Ilhéus, BA : Editus, 2015. xvii, 279 p. ; pdf Texto em inglês.
    [Show full text]
  • Stan: a Probabilistic Programming Language
    JSS Journal of Statistical Software January 2017, Volume 76, Issue 1. doi: 10.18637/jss.v076.i01 Stan: A Probabilistic Programming Language Bob Carpenter Andrew Gelman Matthew D. Hoffman Columbia University Columbia University Adobe Creative Technologies Lab Daniel Lee Ben Goodrich Michael Betancourt Columbia University Columbia University Columbia University Marcus A. Brubaker Jiqiang Guo Peter Li York University NPD Group Columbia University Allen Riddell Indiana University Abstract Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.14.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the limited memory Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propa- gation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line using the cmdstan package, through R using the rstan package, and through Python using the pystan package. All three interfaces sup- port sampling and optimization-based inference with diagnostics and posterior analysis. rstan and pystan also provide access to log probabilities, gradients, Hessians, parameter transforms, and specialized plotting.
    [Show full text]
  • Stan: a Probabilistic Programming Language
    JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ Stan: A Probabilistic Programming Language Bob Carpenter Andrew Gelman Matt Hoffman Columbia University Columbia University Adobe Research Daniel Lee Ben Goodrich Michael Betancourt Columbia University Columbia University University of Warwick Marcus A. Brubaker Jiqiang Guo Peter Li University of Toronto, NPD Group Columbia University Scarborough Allen Riddell Dartmouth College Abstract Stan is a probabilistic programming language for specifying statistical models. A Stan program imperatively defines a log probability function over parameters conditioned on specified data and constants. As of version 2.2.0, Stan provides full Bayesian inference for continuous-variable models through Markov chain Monte Carlo methods such as the No-U-Turn sampler, an adaptive form of Hamiltonian Monte Carlo sampling. Penalized maximum likelihood estimates are calculated using optimization methods such as the Broyden-Fletcher-Goldfarb-Shanno algorithm. Stan is also a platform for computing log densities and their gradients and Hessians, which can be used in alternative algorithms such as variational Bayes, expectation propa- gation, and marginal inference using approximate integration. To this end, Stan is set up so that the densities, gradients, and Hessians, along with intermediate quantities of the algorithm such as acceptance probabilities, are easily accessible. Stan can be called from the command line, through R using the RStan package, or through Python using the PyStan package. All three interfaces support sampling and optimization-based inference. RStan and PyStan also provide access to log probabilities, gradients, Hessians, and data I/O. Keywords: probabilistic program, Bayesian inference, algorithmic differentiation, Stan.
    [Show full text]
  • Advancedhmc.Jl: a Robust, Modular and Efficient Implementation of Advanced HMC Algorithms
    2nd Symposium on Advances in Approximate Bayesian Inference, 20191{10 AdvancedHMC.jl: A robust, modular and efficient implementation of advanced HMC algorithms Kai Xu [email protected] University of Edinburgh Hong Ge [email protected] University of Cambridge Will Tebbutt [email protected] University of Cambridge Mohamed Tarek [email protected] UNSW Canberra Martin Trapp [email protected] Graz University of Technology Zoubin Ghahramani [email protected] University of Cambridge & Uber AI Labs Abstract Stan's Hamilton Monte Carlo (HMC) has demonstrated remarkable sampling robustness and efficiency in a wide range of Bayesian inference problems through carefully crafted adaption schemes to the celebrated No-U-Turn sampler (NUTS) algorithm. It is challeng- ing to implement these adaption schemes robustly in practice, hindering wider adoption amongst practitioners who are not directly working with the Stan modelling language. AdvancedHMC.jl (AHMC) contributes a modular, well-tested, standalone implementation of NUTS that recovers and extends Stan's NUTS algorithm. AHMC is written in Julia, a modern high-level language for scientific computing, benefiting from optional hardware acceleration and interoperability with a wealth of existing software written in both Julia and other languages, such as Python. Efficacy is demonstrated empirically by comparison with Stan through a third-party Markov chain Monte Carlo benchmarking suite. 1. Introduction Hamiltonian Monte Carlo (HMC) is an efficient Markov chain Monte Carlo (MCMC) algo- rithm which avoids random walks by simulating Hamiltonian dynamics to make proposals (Duane et al., 1987; Neal et al., 2011). Due to the statistical efficiency of HMC, it has been widely applied to fields including physics (Duane et al., 1987), differential equations (Kramer et al., 2014), social science (Jackman, 2009) and Bayesian inference (e.g.
    [Show full text]
  • Maple Advanced Programming Guide
    Maple 9 Advanced Programming Guide M. B. Monagan K. O. Geddes K. M. Heal G. Labahn S. M. Vorkoetter J. McCarron P. DeMarco c Maplesoft, a division of Waterloo Maple Inc. 2003. ­ ii ¯ Maple, Maplesoft, Maplet, and OpenMaple are trademarks of Water- loo Maple Inc. c Maplesoft, a division of Waterloo Maple Inc. 2003. All rights re- ­ served. The electronic version (PDF) of this book may be downloaded and printed for personal use or stored as a copy on a personal machine. The electronic version (PDF) of this book may not be distributed. Information in this document is subject to change without notice and does not rep- resent a commitment on the part of the vendor. The software described in this document is furnished under a license agreement and may be used or copied only in accordance with the agreement. It is against the law to copy the software on any medium as specifically allowed in the agreement. Windows is a registered trademark of Microsoft Corporation. Java and all Java based marks are trademarks or registered trade- marks of Sun Microsystems, Inc. in the United States and other countries. Maplesoft is independent of Sun Microsystems, Inc. All other trademarks are the property of their respective owners. This document was produced using a special version of Maple that reads and updates LATEX files. Printed in Canada ISBN 1-894511-44-1 Contents Preface 1 Audience . 1 Worksheet Graphical Interface . 2 Manual Set . 2 Conventions . 3 Customer Feedback . 3 1 Procedures, Variables, and Extending Maple 5 Prerequisite Knowledge . 5 In This Chapter .
    [Show full text]
  • End-User Probabilistic Programming
    End-User Probabilistic Programming Judith Borghouts, Andrew D. Gordon, Advait Sarkar, and Neil Toronto Microsoft Research Abstract. Probabilistic programming aims to help users make deci- sions under uncertainty. The user writes code representing a probabilistic model, and receives outcomes as distributions or summary statistics. We consider probabilistic programming for end-users, in particular spread- sheet users, estimated to number in tens to hundreds of millions. We examine the sources of uncertainty actually encountered by spreadsheet users, and their coping mechanisms, via an interview study. We examine spreadsheet-based interfaces and technology to help reason under uncer- tainty, via probabilistic and other means. We show how uncertain values can propagate uncertainty through spreadsheets, and how sheet-defined functions can be applied to handle uncertainty. Hence, we draw conclu- sions about the promise and limitations of probabilistic programming for end-users. 1 Introduction In this paper, we discuss the potential of bringing together two rather distinct approaches to decision making under uncertainty: spreadsheets and probabilistic programming. We start by introducing these two approaches. 1.1 Background: Spreadsheets and End-User Programming The spreadsheet is the first \killer app" of the personal computer era, starting in 1979 with Dan Bricklin and Bob Frankston's VisiCalc for the Apple II [15]. The primary interface of a spreadsheet|then and now, four decades later|is the grid, a two-dimensional array of cells. Each cell may hold a literal data value, or a formula that computes a data value (and may depend on the data in other cells, which may themselves be computed by formulas).
    [Show full text]
  • An Introduction to Data Analysis Using the Pymc3 Probabilistic Programming Framework: a Case Study with Gaussian Mixture Modeling
    An introduction to data analysis using the PyMC3 probabilistic programming framework: A case study with Gaussian Mixture Modeling Shi Xian Liewa, Mohsen Afrasiabia, and Joseph L. Austerweila aDepartment of Psychology, University of Wisconsin-Madison, Madison, WI, USA August 28, 2019 Author Note Correspondence concerning this article should be addressed to: Shi Xian Liew, 1202 West Johnson Street, Madison, WI 53706. E-mail: [email protected]. This work was funded by a Vilas Life Cycle Professorship and the VCRGE at University of Wisconsin-Madison with funding from the WARF 1 RUNNING HEAD: Introduction to PyMC3 with Gaussian Mixture Models 2 Abstract Recent developments in modern probabilistic programming have offered users many practical tools of Bayesian data analysis. However, the adoption of such techniques by the general psychology community is still fairly limited. This tutorial aims to provide non-technicians with an accessible guide to PyMC3, a robust probabilistic programming language that allows for straightforward Bayesian data analysis. We focus on a series of increasingly complex Gaussian mixture models – building up from fitting basic univariate models to more complex multivariate models fit to real-world data. We also explore how PyMC3 can be configured to obtain significant increases in computational speed by taking advantage of a machine’s GPU, in addition to the conditions under which such acceleration can be expected. All example analyses are detailed with step-by-step instructions and corresponding Python code. Keywords: probabilistic programming; Bayesian data analysis; Markov chain Monte Carlo; computational modeling; Gaussian mixture modeling RUNNING HEAD: Introduction to PyMC3 with Gaussian Mixture Models 3 1 Introduction Over the last decade, there has been a shift in the norms of what counts as rigorous research in psychological science.
    [Show full text]
  • GNU Octave a High-Level Interactive Language for Numerical Computations Edition 3 for Octave Version 2.0.13 February 1997
    GNU Octave A high-level interactive language for numerical computations Edition 3 for Octave version 2.0.13 February 1997 John W. Eaton Published by Network Theory Limited. 15 Royal Park Clifton Bristol BS8 3AL United Kingdom Email: [email protected] ISBN 0-9541617-2-6 Cover design by David Nicholls. Errata for this book will be available from http://www.network-theory.co.uk/octave/manual/ Copyright c 1996, 1997John W. Eaton. This is the third edition of the Octave documentation, and is consistent with version 2.0.13 of Octave. Permission is granted to make and distribute verbatim copies of this man- ual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the en- tire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the same conditions as for modified versions. Portions of this document have been adapted from the gawk, readline, gcc, and C library manuals, published by the Free Software Foundation, 59 Temple Place—Suite 330, Boston, MA 02111–1307, USA. i Table of Contents Publisher’s Preface ...................... 1 Author’s Preface ........................ 3 Acknowledgements ........................................ 3 How You Can Contribute to Octave ........................ 5 Distribution .............................................. 6 1 A Brief Introduction to Octave ....... 7 1.1 Running Octave...................................... 7 1.2 Simple Examples ..................................... 7 Creating a Matrix ................................. 7 Matrix Arithmetic ................................. 8 Solving Linear Equations..........................
    [Show full text]
  • Stan Modeling Language
    Stan Modeling Language User’s Guide and Reference Manual Stan Development Team Stan Version 2.12.0 Tuesday 6th September, 2016 mc-stan.org Stan Development Team (2016) Stan Modeling Language: User’s Guide and Reference Manual. Version 2.12.0. Copyright © 2011–2016, Stan Development Team. This document is distributed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). For full details, see https://creativecommons.org/licenses/by/4.0/legalcode The Stan logo is distributed under the Creative Commons Attribution- NoDerivatives 4.0 International License (CC BY-ND 4.0). For full details, see https://creativecommons.org/licenses/by-nd/4.0/legalcode Stan Development Team Currently Active Developers This is the list of current developers in order of joining the development team (see the next section for former development team members). • Andrew Gelman (Columbia University) chief of staff, chief of marketing, chief of fundraising, chief of modeling, max marginal likelihood, expectation propagation, posterior analysis, RStan, RStanARM, Stan • Bob Carpenter (Columbia University) language design, parsing, code generation, autodiff, templating, ODEs, probability functions, con- straint transforms, manual, web design / maintenance, fundraising, support, training, Stan, Stan Math, CmdStan • Matt Hoffman (Adobe Creative Technologies Lab) NUTS, adaptation, autodiff, memory management, (re)parameterization, optimization C++ • Daniel Lee (Columbia University) chief of engineering, CmdStan, builds, continuous integration, testing, templates,
    [Show full text]
  • Maple 7 Programming Guide
    Maple 7 Programming Guide M. B. Monagan K. O. Geddes K. M. Heal G. Labahn S. M. Vorkoetter J. McCarron P. DeMarco c 2001 by Waterloo Maple Inc. ii • Waterloo Maple Inc. 57 Erb Street West Waterloo, ON N2L 6C2 Canada Maple and Maple V are registered trademarks of Waterloo Maple Inc. c 2001, 2000, 1998, 1996 by Waterloo Maple Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the copyright holder, except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Contents 1 Introduction 1 1.1 Getting Started ........................ 2 Locals and Globals ...................... 7 Inputs, Parameters, Arguments ............... 8 1.2 Basic Programming Constructs ............... 11 The Assignment Statement ................. 11 The for Loop......................... 13 The Conditional Statement ................. 15 The while Loop....................... 19 Modularization ........................ 20 Recursive Procedures ..................... 22 Exercise ............................ 24 1.3 Basic Data Structures .................... 25 Exercise ............................ 26 Exercise ............................ 28 A MEMBER Procedure ..................... 28 Exercise ............................ 29 Binary Search ......................... 29 Exercises ........................... 30 Plotting the Roots of a Polynomial ............. 31 1.4 Computing with Formulæ .................. 33 The Height of a Polynomial ................. 34 Exercise ...........................
    [Show full text]
  • Bayesian Linear Mixed Models with Polygenic Effects
    JSS Journal of Statistical Software June 2018, Volume 85, Issue 6. doi: 10.18637/jss.v085.i06 Bayesian Linear Mixed Models with Polygenic Effects Jing Hua Zhao Jian’an Luan Peter Congdon University of Cambridge University of Cambridge University of London Abstract We considered Bayesian estimation of polygenic effects, in particular heritability in relation to a class of linear mixed models implemented in R (R Core Team 2018). Our ap- proach is applicable to both family-based and population-based studies in human genetics with which a genetic relationship matrix can be derived either from family structure or genome-wide data. Using a simulated and a real data, we demonstrate our implementa- tion of the models in the generic statistical software systems JAGS (Plummer 2017) and Stan (Carpenter et al. 2017) as well as several R packages. In doing so, we have not only provided facilities in R linking standalone programs such as GCTA (Yang, Lee, Goddard, and Visscher 2011) and other packages in R but also addressed some technical issues in the analysis. Our experience with a host of general and special software systems will facilitate investigation into more complex models for both human and nonhuman genetics. Keywords: Bayesian linear mixed models, heritability, polygenic effects, relationship matrix, family-based design, genomewide association study. 1. Introduction The genetic basis of quantitative phenotypes has been a long-standing research problem as- sociated with a large and growing literature, and one of the earliest was by Fisher(1918) on additive effects of genetic variants (the polygenic effects). In human genetics it is common to estimate heritability, the proportion of polygenic variance to the total phenotypic variance, through twin and family studies.
    [Show full text]
  • The R Journal Volume 3/2, December 2011
    The Journal Volume 3/2, December 2011 A peer-reviewed, open-access publication of the R Foundation for Statistical Computing Contents Editorial..................................................3 Contributed Research Articles Creating and Deploying an Application with (R)Excel and R...................5 glm2: Fitting Generalized Linear Models with Convergence Problems.............. 12 Implementing the Compendium Concept with Sweave and DOCSTRIP............. 16 Watch Your Spelling!........................................... 22 Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming 29 Nonparametric Goodness-of-Fit Tests for Discrete Null Distributions............... 34 Using the Google Visualisation API with R.............................. 40 GrapheR: a Multiplatform GUI for Drawing Customizable Graphs in R............. 45 rainbow: An R Package for Visualizing Functional Time Series.................. 54 Programmer’s Niche Portable C++ for R Packages...................................... 60 News and Notes R’s Participation in the Google Summer of Code 2011....................... 64 Conference Report: useR! 2011..................................... 68 Forthcoming Events: useR! 2012.................................... 70 Changes in R............................................... 72 Changes on CRAN............................................ 84 News from the Bioconductor Project.................................. 86 R Foundation News........................................... 87 2 The Journal is a peer-reviewed publication
    [Show full text]