Sweave: Reproducible Research using and LATEX

Sandra D. Griffith Department of Biostatistics and Epidemiology University of Pennsylvania [email protected]

Biostatistics Computing Workshop Series March 15, 2012

S. Griffith ([email protected]) Sweave 15 March 2012 1 / 20 Non-reproducible Research

• Characteristics

I Prepare or manipulate data in a spreadsheet I Cut and paste output to create tables I Multiple versions of data and analysis scripts I Create many versions of graphics, selecting only one for final presentation of results • Problems

I Data, code, and results not linked I Any changes in analysis or data require manual regeneration of results I Workflow or organization scheme may change over time I Can be difficult to replicate in the future I Less forensic evidence if results are questioned

S. Griffith ([email protected]) Sweave 15 March 2012 2 / 20 Response to Duke University Scandal

“We now require most of our reports to be written using Sweave, a combination of LATEX source and R code (SASweave and odfWeave are also available) so that we can rerun the reports as needed and get the same results.”

S. Griffith ([email protected]) Sweave 15 March 2012 3 / 20 Sweave: Conceptual Overview

• Link data, code, and results with a single .Rnw file

I Similar to .tex file, but includes interspersed “chunks” of R code I Uses noweb syntax for literate programming • Weave .Rnw file to produce .tex file which includes output from R code • Compile TeX file to PDF or PS files as usual • Tangle .Rnw file to extract R code into separate file • In addition to including them in the output, creates individual files for each figure • Can refer to within-chunk R expressions in regular document text using Sexpr

S. Griffith ([email protected]) Sweave 15 March 2012 4 / 20 Getting Started with Sweave

• Assume R and LATEX already installed • Sweave.sty is already included with base R installation

I Preferred method: include R folder containing Sweave.sty in your TeX path

F Will automatically update style file when you update R

I Copy Sweave.sty to a centralized location with other style files, also in your TeX path

F Requires manual updates, but can be located in a central location shared among computers (e.g. Dropbox)

I Hard path: include \usepackage{...\Sweave} in preamble I Copy Sweave.sty into same folder as each .Rnw file

S. Griffith ([email protected]) Sweave 15 March 2012 5 / 20 Anatomy of a Code Chunk

<< label (optional), options >>= insert R code here @

Commonly-used options (see manual for full list) • echo = F Suppress R input from appearing in document (default = T) • eval = F R code not evaluated (default = T) • results = hide Suppress R output from appearing in document (default = verbatim) • results = tex R output will be read as TeX (default = verbatim) • fig = T Code chuck includes a figure (default = F)

S. Griffith ([email protected]) Sweave 15 March 2012 6 / 20 Global Options

Default options can be set in preamble and updated throughout document

• Set R chunk options \SweaveOpts{eval=T, echo=F} • Preserve comments and spacing of echoed R code \SweaveOpts{keep.source=TRUE} • Figure options for height, width, and file type

S. Griffith ([email protected]) Sweave 15 March 2012 7 / 20 Example

<>= x <- exp(2.3) x @

> x <- exp(2.3) > x [1] 9.974182

<>= x <- exp(2.3) x @

[1] 9.974182 <>= x <- exp(2.3) x @ > x <- exp(2.3) > x

S. Griffith ([email protected]) Sweave 15 March 2012 8 / 20 Compiling an Sweave Document

• Manually (Windows or Mac) 1. Run Sweave(‘foo.Rnw’) in R console 2. Open foo.tex in a TeX editor 3. Compile PDF using TeX editor 4. Stangle(‘foo.Rnw’) to extract R code if desired • Manually (Linux/Unix) 1. Run R CMD Sweave foo.Rnw 2. Run pdflatex foo or foo • Integrated Development Environment (IDE)

I Rstudio, Emacs (ESS), Eclipse (StatEt), etc. I If supported, usually one click/command for all steps (Sweave, compile TeX, view PDF)

S. Griffith ([email protected]) Sweave 15 March 2012 9 / 20 RStudio

S. Griffith ([email protected]) Sweave 15 March 2012 10 / 20 The xtable Package: Basic Table Code

R package to convert many R objects to LATEXor HTML tables <>= library(xtable) data(tli) xtable(table(tli$ethnicty, tli$sex), caption="Distribution of gender and ethnicity") @

<>= lm1 <- lm(tlimth ~ sex + ethnicty, data=tli) xtable(lm1, caption="Linear Model Results") @

S. Griffith ([email protected]) Sweave 15 March 2012 11 / 20 The xtable package: Basic Table Output

FM BLACK 11 12 HISPANIC 8 12 OTHER 2 0 WHITE 30 25 Table: Distribution of gender and ethnicity

Estimate Std. Error t value Pr(>|t|) (Intercept) 71.0226 3.2894 21.59 0.0000 sexM 3.3734 2.8594 1.18 0.2410 ethnictyHISPANIC -3.7466 4.3044 -0.87 0.3863 ethnictyOTHER 18.4774 10.4716 1.76 0.0809 ethnictyWHITE 7.4622 3.4964 2.13 0.0354 Table: Linear Model Results

S. Griffith ([email protected]) Sweave 15 March 2012 12 / 20 The xtable package: Customized Tables

> mat <- round(matrix(c(0.9, 0.89, 200, 0.045, 2.0), + c(1, 5)), 4) > rownames(mat) <- "$y_{t-1}$" > colnames(mat) <- c("$R^2$", "$\\bar{R}^2$", + "F-stat", "S.E.E", "DW") > mat <- xtable(mat) > print(mat, sanitize.text.function = function(x){x})

R2 R¯ 2 F-stat S.E.E DW yt−1 0.90 0.89 200.00 0.04 2.00

Almost all functionality available for LATEX tables can be included directly in R code using xtable

S. Griffith ([email protected]) Sweave 15 March 2012 13 / 20 Aside: Using xtable for MS Word Tables

Non-statistical collaborators often prefer tabular results in MS Word

xtable(table(tli$ethnicty, tli$sex), file="TabGenderRace", type="html" )

1. Save results in HTML file using xtable() in R 2. Open “TabGenderRace.htm” in a browser 3. Copy and paste into Word document as a fully-formatted table

S. Griffith ([email protected]) Sweave 15 March 2012 14 / 20 Basic Figure Example

<>= plot(1:10, rnorm(10)) @

● ● ● 1

0 ● ● ● ● ● rnorm(10) −1

−2 ●

2 4 6 8 10

1:10

NB: Embed figure chunk within a LATEX figure environment for more precise control S. Griffith ([email protected]) Sweave 15 March 2012 15 / 20 Large or Computationally Intensive Projects

• Use input statements or make files • save() and load() intermediate results • Conditional evaluation if (file exists) {load file} else {run; save file}) • Change R chunk evaluation options as necessary • : cacheSweave to cache intermediate results

S. Griffith ([email protected]) Sweave 15 March 2012 16 / 20 Including R code as an Appendix

• Useful for homework, solution sets, etc. • Include \usepackage{listings} in the preamble • Include the following R chunk and TeX code in foo.Rnw where you would like to place appendix <>= Stangle(file="foo.Rnw",output="foo.R", annotate=FALSE) @ \pagebreak \section{R Code} \texttt{\lstinputlisting[emptylines=0]{foo.R}}

S. Griffith ([email protected]) Sweave 15 March 2012 17 / 20 Miscellaneous Sweave Tricks

• Load all libraries in one chunk with results = hide option to suppress unwanted output (e.g. package dependencies) • Beamer presentations

I Include [fragile] option for every frame with R code to handle verbatim output I For frames with TeX and verbatim output, must include [containsverbatim] option instead • R graphics package

I Must use print() wrapper for ggplot objects • R session information > toLatex(sessionInfo(), locale=F)

I R version 2.14.1 (2011-12-22), x86_64-pc-mingw32 I Base packages: base, datasets, graphics, grDevices, methods, stats, utils I Other packages: xtable 1.7-0 I Loaded via a namespace (and not attached): tools 2.14.1

S. Griffith ([email protected]) Sweave 15 March 2012 18 / 20 Alternatives for Reproducible Research

• R for other document formats

I HTML: R2HTML I Open Office: odfWeave I MS Word: Sword I MS Powerpoint: R2PPT • Other statistical packages

I Statweave for SAS, Stata, or MATLAB and LATEX or Open Office I Various other software-specific report generators

S. Griffith ([email protected]) Sweave 15 March 2012 19 / 20 Resources

• Sweave user manual (Friedrich Leisch): http://www.stat. uni-muenchen.de/~leisch/Sweave/Sweave-manual.pdf • Stack Overflow questions tagged Sweave: http://stackoverflow.com/questions/tagged/sweave • Keith Baggerly’s introduction to Sweave: http://bioinformatics. mdanderson.org/SweaveTalk/sweaveTalkb.pdf • QuickR summary of alternatives to Sweave: http://www.statmethods.net/interface/output.html • Citing R with Sweave: http://biostat.mc.vanderbilt.edu/ wiki/pub/Main/SweaveLatex/RCitation.pdf • xtable gallery with examples: http://cran.r-project.org/web/ packages/xtable/vignettes/xtableGallery.pdf

S. Griffith ([email protected]) Sweave 15 March 2012 20 / 20