Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan

Chapter 2 Prerequisites and Vocabulary Chapter Outline 2.1 Software 5 2.2.2 Distributions and 2.1.1 What Is R? 5 Summary Statistics 12 2.1.2 Working with R 6 2.2.3 More on R Objects 15 2.2 Important Statistical Terms 2.2.4 R Functions for and How to Handle Them Graphics 16 in R 7 2.2.5 Writing Our Own R 2.2.1 Data Sets, Variables, Functions 17 and Observations 7 2.1 SOFTWARE In most chapters of this book we work with the statistical software R (R Core Team, 2014). R is a very powerful tool for statistics and graphics in general. However, it is limited with regard to Bayesian methods applied to more complex models. In Part II of the book (Chapters 12e15), we therefore use Open BUGS (www.openbugs.net; Spiegelhalter et al., 2007) and Stan (Stan Development Team, 2014), using specific interfaces to operate them from within R. OpenBUGS and Stan are introduced in Chapter 12. Here, we briefly introduce R. 2.1.1 What Is R? R is a software environment for statistics and graphics that is free in two ways: free to download and free source code (www.r-project.org). The first version of R was written by Robert Gentleman and Ross Ihaka of the University of Auckland (note that both names begin with “R”). Since 1997, R has been governed by a core group of R contributors (www.r-project.org/contributors. html). R is a descendant of the commercial S language and environment that was developed at Bell Laboratories by John Chambers and colleagues. Most code written for S runs in R, too. It is an asset of R that, along with statistical analyses, well-designed publication-quality graphics can be pro- duced. R runs on all operating systems (UNIX, Linux, Mac, Windows). Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan http://dx.doi.org/10.1016/B978-0-12-801370-0.00002-2. Copyright © 2015 Elsevier Inc. All rights reserved. 5 6 Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan R is different from many statistical software packages that work with menus. R is a programming language or, in fact, a programming environment. This means that we need to write down our commands in the form of R code. While this may need a bit of effort in the beginning, we will soon be able to reap the first fruits. Writing code enforces us to know what we are doing and why we are doing it, and enables us to learn about statistics and the R language rapidly. And because we save the R code of our analyses, they are easily reproduced, comprehensible for colleagues (especially if the code is furnished with comments), or easily adapted and extended to a similar new analysis. Due to its flexibility, R also allows us to write our own functions and to make them available for other users by sharing R code or, even better, by compiling them in an R package. R packages are extensions of the slim basic R distribution, which is supplied with only about eight packages, and typically contain R functions and sometimes also data sets. A steadily increasing number of packages are available from the network of CRAN mirror sites (currently over 5000), accessible at www.r-project.org. Compared to other dynamic, high-level programming languages such as Python (www.python.org) or Julia (Bezanson et al., 2012; www.julialang.org), R will need more time for complex computations on large data sets. However, the aim of R is to provide an intuitive, “easy to use” programming language for data analyses for those who are not computer specialists (Chambers, 2008), thereby trading off computing power and sometimes also precision of the code. For example, R is quite flexible regarding the use of spaces in the code, which is convenient for the user. In contrast, Python and Julia require a stricter coding, which makes the code more precise but also more difficult to learn. Thus, we consider R as the ideal language for many statistical problems faced by ecologists and many other scientists. 2.1.2 Working with R If you are completely new to R, we recommend that you take an introductory course or work through an introductory book or document (see recommen- dations in the Further Reading section at the end of this chapter). R is orga- nized around functions, that is, defined commands that typically require inputs (arguments) and return an output. In what follows, we will explain some important R functions used in this book, without providing a full introduction to R. Moreover, the list of functions explained in this chapter is only a selection and we will come across many other functions in this book. That said, what follows should suffice to give you a jumpstart. We can easily install additional packages by using the function install.packages and load packages by using the function library. Each R function has documentation describing what the function does and how it is used. If the package containing a function is loaded in the current R session, we can open the documentation using ?. Typing ?mean into the R Prerequisites and Vocabulary Chapter j 2 7 console will open the documentation for the function mean (arithmetic mean). If we are looking for a specific function, we can use the function help.search to search for functions within all installed packages. Typing help. search(“linear model”), will open a list of functions dealing with linear models (together with the package containing them). For example, stats::lm suggests the function lm from the package stats. Shorter, but equivalent to help.search(“linear model”) is ??”linear model”. Alternatively, R’s online documentation can also be accessed with help.start(). Functions/packages that are not installed yet can be found using the specific search menu on www. r-project.org. Once familiar with using the R help and searching the internet efficiently for R-related topics, we can independently broaden our knowledge about R. Note that whenever we show R code in this book, the code is printed in orange font. Comments, which are preceded by a hash sign, #, and are therefore not executed by R, are printed in green. R output is printed in blue font. 2.2 IMPORTANT STATISTICAL TERMS AND HOW TO HANDLE THEM IN R 2.2.1 Data Sets, Variables, and Observations Data are always collected on a sample of objects (e.g., animals, plants, or plots). An observation refers to the smallest observational or experimental unit. In fact, this can also be a smaller unit, such as the wing of a bird, a leaf of a plant, or a subplot. Data are collected with regard to certain characteristics (e.g., age, sex, size, weight, level of blood parameters), all of which are called variables. A collection of data, a so-called “data set,” can consist of one or many variables. The term variable illustrates that these characteristics vary between the observations. Variables can be classified in several ways, for instance, by the scale of measurement. We distinguish between nominal, ordinal, and numeric variables (see Table 2-1). Nominal and ordinal variables can be summarized as categorical variables. Numeric variables can be further classified as discrete or continuous. Moreover, note that categorical variables are often called factors and numeric variables are often called covariates. Now let us look at ways to store and handle data in R. A simple, but probably the most important, data structure is a vector. It is a collection of ordered elements of the same type. We can use the function c to combine these elements, which are automatically coerced to a common type. The type of elements determines the type of the vector. Vectors can (among other things) be used to represent variables. Here are some examples: v1 <e c(1,4,2,8) v2 <e c(“bird”,”bat”,”frog”,”bear”) v3 <e c(1,4,”bird”,”bat”) 8 Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan TABLE 2-1 Scales of Measurement Typical Scale Examples Properties coding in R Nominal Sex, genotype, Identity (values have a Factor habitat unique meaning) Ordinal Altitudinal zones Identity and magnitude Ordered factor (e.g., foothill, (values have an ordered montane, subalpine, relationship, some values alpine zone) are larger and some are smaller) Numeric Discrete: counts Identity, magnitude, and Integer Continuous: body equal intervals (units along Numeric weight, wing length, the scale are equal to each speed other) and possibly a minimum value of zero (ratios are interpretable) R is an object-oriented language and vectors are specific types of objects. The class of objects can be obtained by the function class.Avector of numbers (e.g., v1) is a numeric vector (corresponding to a numeric variable); a vector of words (v2) is a character vector (corresponding to a categorical variable). If we mix numbers and words (v3), we will get a character vector. class(v1) [1] "numeric" class(v2) [1] "character" class(v3) [1] "character" The function rev can be used to reverse the order of elements. rev(v1) [1]8241 Numeric vectors can be used in arithmetic expressions, using the usual arithmetic operators D, -, *, and /, including ˆ for raising to a power. The operations are performed element by element. In addition, all of the common arithmetic functions are available (e.g., log and sqrt for the logarithm and the square root). To generate a sequence of numbers, R offers several possibilities.

Bayesian Data Analysis in Ecology Using Linear Models with R, BUGS, and Stan

Chapter 7 Assessing and Improving Convergence of the Markov Chain

Getting Started in Openbugs / Winbugs

BUGS Code for Item Response Theory

Stan: a Probabilistic Programming Language

Installing BUGS and the R to BUGS Interface 1. Brief Overview

Winbugs Lectures X4.Pdf

Flexible Programming of Hierarchical Modeling Algorithms and Compilation of R Using NIMBLE Christopher Paciorek UC Berkeley Statistics

Spaio-‐Temporal Dependence: a Blessing and a Curse For

Winbugs for Beginners

Package Name Software Description Project

Dilek Yildiz

Lecture 1. Introduction to Bayesian Monte Carlo Methods in WINBUGS