Sweave: Reproducible Research using R and LATEX
Sandra D. Griffith Department of Biostatistics and Epidemiology University of Pennsylvania [email protected]
Biostatistics Computing Workshop Series March 15, 2012
S. Griffith ([email protected]) Sweave 15 March 2012 1 / 20 Non-reproducible Research
• Characteristics
I Prepare or manipulate data in a spreadsheet I Cut and paste output to create tables I Multiple versions of data and analysis scripts I Create many versions of graphics, selecting only one for final presentation of results • Problems
I Data, code, and results not linked I Any changes in analysis or data require manual regeneration of results I Workflow or organization scheme may change over time I Can be difficult to replicate in the future I Less forensic evidence if results are questioned
S. Griffith ([email protected]) Sweave 15 March 2012 2 / 20 Response to Duke University Scandal
“We now require most of our reports to be written using Sweave, a literate programming combination of LATEX source and R code (SASweave and odfWeave are also available) so that we can rerun the reports as needed and get the same results.”
S. Griffith ([email protected]) Sweave 15 March 2012 3 / 20 Sweave: Conceptual Overview
• Link data, code, and results with a single .Rnw file
I Similar to .tex file, but includes interspersed “chunks” of R code I Uses noweb syntax for literate programming • Weave .Rnw file to produce .tex file which includes output from R code • Compile TeX file to PDF or PS files as usual • Tangle .Rnw file to extract R code into separate file • In addition to including them in the output, creates individual files for each figure • Can refer to within-chunk R expressions in regular document text using Sexpr
S. Griffith ([email protected]) Sweave 15 March 2012 4 / 20 Getting Started with Sweave
• Assume R and LATEX already installed • Sweave.sty is already included with base R installation
I Preferred method: include R folder containing Sweave.sty in your TeX path
F Will automatically update style file when you update R
I Copy Sweave.sty to a centralized location with other style files, also in your TeX path
F Requires manual updates, but can be located in a central location shared among computers (e.g. Dropbox)
I Hard path: include \usepackage{...\Sweave} in preamble I Copy Sweave.sty into same folder as each .Rnw file
S. Griffith ([email protected]) Sweave 15 March 2012 5 / 20 Anatomy of a Code Chunk
<< label (optional), options >>= insert R code here @
Commonly-used options (see manual for full list) • echo = F Suppress R input from appearing in document (default = T) • eval = F R code not evaluated (default = T) • results = hide Suppress R output from appearing in document (default = verbatim) • results = tex R output will be read as TeX (default = verbatim) • fig = T Code chuck includes a figure (default = F)
S. Griffith ([email protected]) Sweave 15 March 2012 6 / 20 Global Options
Default options can be set in preamble and updated throughout document
• Set R chunk options \SweaveOpts{eval=T, echo=F} • Preserve comments and spacing of echoed R code \SweaveOpts{keep.source=TRUE} • Figure options for height, width, and file type
S. Griffith ([email protected]) Sweave 15 March 2012 7 / 20 Example
<
> x <- exp(2.3) > x [1] 9.974182
<
[1] 9.974182 <
S. Griffith ([email protected]) Sweave 15 March 2012 8 / 20 Compiling an Sweave Document
• Manually (Windows or Mac) 1. Run Sweave(‘foo.Rnw’) in R console 2. Open foo.tex in a TeX editor 3. Compile PDF using TeX editor 4. Stangle(‘foo.Rnw’) to extract R code if desired • Manually (Linux/Unix) 1. Run R CMD Sweave foo.Rnw 2. Run pdflatex foo or latex foo • Integrated Development Environment (IDE)
I Rstudio, Emacs (ESS), Eclipse (StatEt), etc. I If supported, usually one click/command for all steps (Sweave, compile TeX, view PDF)
S. Griffith ([email protected]) Sweave 15 March 2012 9 / 20 RStudio
S. Griffith ([email protected]) Sweave 15 March 2012 10 / 20 The xtable Package: Basic Table Code
R package to convert many R objects to LATEXor HTML tables <
<
S. Griffith ([email protected]) Sweave 15 March 2012 11 / 20 The xtable package: Basic Table Output
FM BLACK 11 12 HISPANIC 8 12 OTHER 2 0 WHITE 30 25 Table: Distribution of gender and ethnicity
Estimate Std. Error t value Pr(>|t|) (Intercept) 71.0226 3.2894 21.59 0.0000 sexM 3.3734 2.8594 1.18 0.2410 ethnictyHISPANIC -3.7466 4.3044 -0.87 0.3863 ethnictyOTHER 18.4774 10.4716 1.76 0.0809 ethnictyWHITE 7.4622 3.4964 2.13 0.0354 Table: Linear Model Results
S. Griffith ([email protected]) Sweave 15 March 2012 12 / 20 The xtable package: Customized Tables
> mat <- round(matrix(c(0.9, 0.89, 200, 0.045, 2.0), + c(1, 5)), 4) > rownames(mat) <- "$y_{t-1}$" > colnames(mat) <- c("$R^2$", "$\\bar{R}^2$", + "F-stat", "S.E.E", "DW") > mat <- xtable(mat) > print(mat, sanitize.text.function = function(x){x})
R2 R¯ 2 F-stat S.E.E DW yt−1 0.90 0.89 200.00 0.04 2.00
Almost all functionality available for LATEX tables can be included directly in R code using xtable
S. Griffith ([email protected]) Sweave 15 March 2012 13 / 20 Aside: Using xtable for MS Word Tables
Non-statistical collaborators often prefer tabular results in MS Word
xtable(table(tli$ethnicty, tli$sex), file="TabGenderRace", type="html" )
1. Save results in HTML file using xtable() in R 2. Open “TabGenderRace.htm” in a browser 3. Copy and paste into Word document as a fully-formatted table
S. Griffith ([email protected]) Sweave 15 March 2012 14 / 20 Basic Figure Example
<
● ● ● 1
●
0 ● ● ● ● ● rnorm(10) −1
−2 ●
2 4 6 8 10
1:10
NB: Embed figure chunk within a LATEX figure environment for more precise control S. Griffith ([email protected]) Sweave 15 March 2012 15 / 20 Large or Computationally Intensive Projects
• Use input statements or make files • save() and load() intermediate results • Conditional evaluation if (file exists) {load file} else {run; save file}) • Change R chunk evaluation options as necessary • R package: cacheSweave to cache intermediate results
S. Griffith ([email protected]) Sweave 15 March 2012 16 / 20 Including R code as an Appendix
• Useful for homework, solution sets, etc. • Include \usepackage{listings} in the preamble • Include the following R chunk and TeX code in foo.Rnw where you would like to place appendix <
S. Griffith ([email protected]) Sweave 15 March 2012 17 / 20 Miscellaneous Sweave Tricks
• Load all libraries in one chunk with results = hide option to suppress unwanted output (e.g. package dependencies) • Beamer presentations
I Include [fragile] option for every frame with R code to handle verbatim output I For frames with TeX and verbatim output, must include [containsverbatim] option instead • R graphics package ggplot2
I Must use print() wrapper for ggplot objects • R session information > toLatex(sessionInfo(), locale=F)
I R version 2.14.1 (2011-12-22), x86_64-pc-mingw32 I Base packages: base, datasets, graphics, grDevices, methods, stats, utils I Other packages: xtable 1.7-0 I Loaded via a namespace (and not attached): tools 2.14.1
S. Griffith ([email protected]) Sweave 15 March 2012 18 / 20 Alternatives for Reproducible Research
• R for other document formats
I HTML: R2HTML I Open Office: odfWeave I MS Word: Sword I MS Powerpoint: R2PPT • Other statistical packages
I Statweave for SAS, Stata, or MATLAB and LATEX or Open Office I Various other software-specific report generators
S. Griffith ([email protected]) Sweave 15 March 2012 19 / 20 Resources
• Sweave user manual (Friedrich Leisch): http://www.stat. uni-muenchen.de/~leisch/Sweave/Sweave-manual.pdf • Stack Overflow questions tagged Sweave: http://stackoverflow.com/questions/tagged/sweave • Keith Baggerly’s introduction to Sweave: http://bioinformatics. mdanderson.org/SweaveTalk/sweaveTalkb.pdf • QuickR summary of alternatives to Sweave: http://www.statmethods.net/interface/output.html • Citing R with Sweave: http://biostat.mc.vanderbilt.edu/ wiki/pub/Main/SweaveLatex/RCitation.pdf • xtable gallery with examples: http://cran.r-project.org/web/ packages/xtable/vignettes/xtableGallery.pdf
S. Griffith ([email protected]) Sweave 15 March 2012 20 / 20