2019 / Cheatsheets

rstudio::conf 2019 / CHEATSHEETS RStudioRStudio IDEIDE :: :: CHEATCHEAT SHEET SHEET DocumentsDocuments and and Apps Apps WriteWrite Code Code RR Support Support ProPro Features Features Documents and Apps NavigateWriteNavigate OpenOpen Code in in new new SaveSaveFindFind and and CompileCompile as as RunRun ImportRImport Support data data HistoryHistory of of past past DisplayDisplay .RPres .RPres slideshows slideshows ShareSharePro Project Project Features ActiveActive shared shared OpenOpen Shiny, Shiny, R RMarkdown, Markdown, Share Project Active shared Open Shiny, R Markdown, tabsNavigatetabsNavigate windowOpenwindowOpen in in new new SaveSavereplaceFindreplaceFind and and notebookCompilenotebookCompile as as selectedRunselectedRun withImportwithImport wizard wizard data data commandsHistorycommandsHistory of of past pastto to FileDisplayFileDisplay > >New .RPresNew .RPres File File slideshows slideshows> > withwithShare Collaborators Collaborators Project collaboratorscollaboratorsActive shared knitr,Openknitr, Sweave, Sweave,Shiny, RLaTeX, LaTeX,Markdown, .Rd .Rd files files with Collaborators collaborators knitr,knitr, Sweave, Sweave, LaTeX, LaTeX, .Rd .Rd files files tabstabs windowwindow replacereplace notebooknotebook codeselectedselectedcode withwith wizard wizard run/copycommandscommandsrun/copy to to RFile FileRPresentation Presentation > >New New File File > > with Collaborators collaborators StartStart new new R RSession Session andand more more in in Source Source Pane Pane code run/copy R Presentation Start new R Session andand more more in in Source Source Pane Pane code run/copy R Presentation T TH HJ J ininStart current current new project Rproject Session T TH HJ J T H J in current project CheckCheck RenderRender ChooseChoose ChooseChoose InsertInsert T H J in current project Check Render Choose Choose Insert T H J CloseClose R R spellingCheckspelling outputRenderoutput outputChooseoutput outputChooseoutput codeInsertcode T H J SessionCloseCloseSession R R in in spellingspelling outputoutput formatoutputformatoutput locationoutputlocationoutput chunkcodechunkcode Session in format location chunk projectprojectSession in format location chunk projectproject SelectSelect RSelect RSelectVersion Version CursorsCursors of of Re-runRe-run SourceSource with with or or ShowShow file file LoadLoad SaveSave DeleteDelete all all SearchSearch inside inside R Version sharedCursorssharedCursors users of users of previousRe-runpreviousRe-run code code withoutSourcewithoutSource with Echo with Echo or or outlineShowoutlineShow file file workspaceLoadworkspaceLoad workspaceSaveworkspaceSave savedDeleteDeletesaved objects all objects all environmentSearchSearchenvironment inside inside R Version JumpJump to to JumpJump RunRun PublishPublish ShowShow file file shared users previous code without Echo outline workspace workspace saved objects environment Jump to Jump Run Publish Show file shared users previous code without Echo outline workspace workspace saved objects environment PROJECTPROJECT SYSTEM SYSTEM previouspreviousJump to totoJump next next selectedselectedRun totoPublish server server outlineoutlineShow file MultipleMultiple cursors/column cursors/column selection selection ChooseChoose environment environment to to display display from from DisplayDisplay objects objects PROJECTPROJECT SYSTEM SYSTEM previous to next selected to server outline Multiple cursors/column selection Choose environment to display from Display objects FileFile > >New New Project Project chunkpreviouschunk chunk tochunk next linesselectedlines to server outline withMultiplewith Alt Alt + +mousecursors/column mouse drag drag. . selection listChooselist of of parent parent environment environments environments to display from asDisplayas list list or or grid objects grid File > New Project chunkchunk chunkchunk lineslines with Alt + mouse drag. list of parent environments as list or grid File > New Project with Alt + mouse drag. list of parent environments as list or grid RStudioRStudio saves saves the the call call history, history, AccessAccess markdown markdown guide guide at at CodeCode diagnostics diagnostics that that appear appear in in the the margin. margin. Code diagnostics that appear in the margin. RStudioRStudio saves saves the the call call history, history, HelpAccessHelpAccess > markdown>Markdown markdownMarkdown guide Quick guide Quick at atReference Reference HoverCodeHover over diagnostics over diagnostic diagnostic that symbols appearsymbols forin for the details. details. margin. workspace,workspace, and and working working Help > Markdown Quick Reference Hover over diagnostic symbols for details. directoryworkspace,workspace,directory associated associated and and working working with with a a Help > Markdown Quick Reference Hover over diagnostic symbols for details. NameName of of directory associated with a SyntaxSyntax highlighting highlighting based based Name of project.project.directory It Itreloads associated reloads each each with when when a JumpJump to to SetSet knitr knitr RunRun this this and and RunRun this this onSyntaxon your your highlighting file's file's extension extension based currentNamecurrent projectof project project. It reloads each when chunkJumpchunk to chunkSetchunk knitr allRunall previous thisprevious and codeRuncode this chunk chunk Syntax highlighting based current project youproject.you re-open re-open It reloads a aproject. project. each when Jump to Set knitr Run this and Run this onon your your file's file's extension extension current project you re-open a project. chunkchunk optionschunkchunkoptions codeallallcode previous previous chunks chunks codecode chunk chunk you re-open a project. options code chunks TabTab completion completion to to finish finish options code chunks functionTabTabfunction completion completion names, names, tofile to filefinish paths,finish paths, DisplaysDisplays saved saved objects objects by by ViewView in in data data ViewView function function Displays saved objects by View in data View function RStudioRStudio opens opens plots plots in in a adedicated dedicated Plots Plots pane pane arguments,functionarguments,function names, names, and and more.file more.file paths, paths, typetypeDisplays with with short saved short description objectsdescription by viewerviewerView in data sourcesourceView functioncode code RStudioRStudio opens opens plots plots in in a adedicated dedicated Plots Plots pane pane arguments,arguments, and and more. more. typetype with with short short description description viewerviewer sourcesource code code Multi-languageMulti-language code code snippetsMulti-languageMulti-languagesnippets to to quickly quickly code code use use NavigateNavigate OpenOpen in in ExportExport DeleteDelete DeleteDelete commonsnippetssnippetscommon to blocks to blocksquickly quickly of of code.use code.use recentNavigateNavigaterecent plots plots OpenwindowOpenwindow in in plotExportExportplot plotDeleteplotDelete allDeleteDeleteall plots plots RStudioRStudio recognizes recognizes that that files files named named app.R app.R, , commoncommon blocks blocks of of code. code. recentrecent plots plotswindowwindow plotplot plotplot allall plots plots RStudio recognizes that files named app.R, JumpJump to to function function in in file file ChangeChange file file type type GUIGUI Package Package manager manager lists lists every every installed installed package package server.Rserver.RRStudio, ui.Rrecognizes, ui.R, and, and global.R thatglobal.R files belong named belong to app.R to a ashiny shiny, app app Jump to function in file Change file type GUI Package manager lists every installed package server.Rserver.R, ui.R, ui.R, and, and global.R global.R belong belong to to a ashiny shiny app app Jump to function in file Change file type CreateCreate UploadUpload Delete Delete RenameRename ChangeChange GUI Package manager lists every installed package folderCreateCreatefolder fileUploadUploadfile fileDelete Deletefile fileRenameRenamefile directoryChangeChangedirectory folderfolder filefile filefile filefile directorydirectory InstallInstall UpdateUpdate CreateCreate reproducible reproducible package package RunRun ChooseChoose PublishPublish to to ManageManage PathPath to to displayed displayed directory directory Install Update Create reproducible package Run Choose Publish to Manage Path to displayed directory PackagesInstallPackages PackagesUpdatePackages libraryCreatelibrary for forreproducible your your project project package appRunapp locationChooselocation to to shinyapps.io Publishshinyapps.io to publishManagepublish WorkingWorking Maximize,Maximize, Path to displayed directory PackagesPackages PackagesPackageslibrarylibrary for for your your project project appapp viewlocationlocationview app app to to orshinyapps.io shinyapps.ioor server server accountspublishpublishaccounts DirectoryWorkingWorkingDirectory minimizeMaximize,Maximize,minimize panes panes view app or server accounts Directory minimize panes view app or server accounts PressDirectoryPress to to see see DragDragminimize pane pane panes A AFile File browser browser keyed keyed to to your your working working directory. directory. Press to see Drag pane A File browser keyed to your working directory. commandcommandPress tohistory history see boundariesDragboundaries pane ClickAClick File on on browser file file or or directory keyeddirectory to name your name workingto to open. open. directory. ClickClick to to load load package package with with PackagePackage DeleteDelete command history boundaries ClickClick on on file file or or directory directory name name to to open. open. command history boundaries library()Clicklibrary()Click to to load .load Unclick. Unclick package package to to detach with detach with versionPackagePackageversion fromDeleteDeletefrom packagelibrary()packagelibrary() with. Unclickwith. Unclick detach() detach() to to detach detach installedversioninstalledversion libraryfromlibraryfrom package with detach() installed library DebugDebug Mode Mode VersionVersion Control Control with with Git Git or or SVN SVN package with detach() installed library Debug Mode Version Control withwith Git Git or or SVN SVN RStudioRStudio opens opens documentation documentation in in a adedicated dedicated Help Help pane pane Debug Mode LaunchLaunch debugger debugger OpenOpen traceback traceback to to examine examine VersionTurnTurn on on at at ControlTools Tools > >Project Project Options Options > >Git/SVN Git/SVN RStudioRStudio opens opens documentation documentation in in a adedicated dedicated Help Help pane pane LaunchLaunch debugger debugger OpenOpen traceback traceback to to examine examine TurnTurn on on at at Tools Tools > >Project Project Options Options > >Git/SVN Git/SVN OpenOpen with with debug(), debug(), browser(), browser(), or or a abreakpoint. breakpoint. RStudio RStudio will will open open the the modemode from from origin origin thethe functions functions that that R Rcalled called StageStage ShowShow file file CommitCommit Push/PullPush/Pull ViewView debuggerOpenOpendebugger with with mode debug(), mode debug(), when when browser(), browser(),it itencounters encounters or or a a abreakpoint. abreakpoint breakpoint. breakpoint RStudiowhile RStudiowhile executing executing will will open open code. code.the the ofmodemodeof error error from from origin origin beforethebeforethe functions functions the the error error that that occurred occurredR Rcalled called Stage Show file Commit Push/Pull View debugger mode when it encounters a breakpoint while executing code. of error before the error occurred files:Stagefiles: diShowdiff ff file stagedCommitstaged files files totoPush/Pull remote remote HistoryViewHistory debugger mode when it encounters a breakpoint while executing code. of error before the error occurred files: didiff ff stagedstaged files filestoto remote remote History files: History HomeHome page page of of SearchSearch within within SearchSearch for for A• A•AddedAdded Home page of Search within Search for ClickClick next next to to A helpfulHomehelpful links page links of helpSearchhelp file file within helpSearchhelp file file for Click next to D• DA•DeletedAddedAddedDeleted helpful links help file help file linelineClick number number next to to to D helpful links help file help file line number to M• MD•ModifiedDeletedDeletedModified add/removelineadd/remove number a toa M ViewerViewer Pane Pane displays displays HTML HTML content, content, such such as as Shiny Shiny apps, apps, • MModified Viewer Pane displays HTML content, such as Shiny apps, breakpoint.add/removebreakpoint.add/remove a a R• R•RenamedRenamedModified OpenOpen shell shell to to currentcurrent ViewerRMarkdownRMarkdown Pane displays reports, reports, HTML and and interactive content, interactive such visualizations visualizations as Shiny apps, R• Renamed Open shell to current breakpoint.breakpoint. ?• ?R•UntrackedRenamedUntracked typetypeOpen commands commands shell to branchcurrentbranch RMarkdownRMarkdown reports, reports, and and interactive interactive visualizations visualizations ?• Untracked ?• Untracked typetype commands commands branchbranch HighlightedHighlighted lineHighlightedHighlightedline shows shows StopStop Shiny Shiny PublishPublish to to shinyapps.io, shinyapps.io, RefreshRefresh line shows wherewhereline shows PackagePackage Writing Writing appStopappStop Shiny Shiny rpubs,Publishrpubs,Publish RSConnect, to RSConnect, to shinyapps.io, shinyapps.io, … … RefreshRefresh executionwherewhereexecution has has Package Writing appapp rpubs,rpubs, RSConnect, RSConnect, … … executionexecution has has Package Writing View()View() opens opens spreadsheet spreadsheet like like view view of of data data set set pausedpaused FileFile > >New New Project Project > > View() opens spreadsheet like view of data set pausedpaused File > New Project > View() opens spreadsheet like view of data set NewNewFile Directory > Directory New Project > >R RPackage Package> NewNew Directory Directory > >R RPackage Package RunRun commands commands in in ExamineExamine variables variables SelectSelect function function StepStep through through StepStep into into and and ResumeResume QuitQuit debug debug TurnTurn project project into into package, package, Run commands in Examine variables Select function Step through Step into and Resume Quit debug EnableTurnEnableTurn project project roxygen roxygen into into documentation package,documentation package, with with environmentenvironmentRun commands where where in ininExamine executing executing variables ininSelect traceback traceback function to to codecodeStep one throughone line line outoutStep of of functionsinto functions and executionResumeexecution modeQuitmode debug Enable roxygen documentation with executionenvironmentenvironmentexecution has has where paused where paused environmentininenvironment executing executing debuginindebug traceback traceback to to atcodeatcode a atime onetime one line line tooutoutto run ofrun of functions functions executionexecutionmodemode Tools EnableTools > >Project roxygen Project Options documentation Options > >Build Build with Tools Tools execution has paused environment debug at a time to run Tools > Project Options > Build Tools execution has paused environment debug at a time to run Tools > Project Options > Build Tools FilterFilter rows rows by by value value SortSort by by SearchSearch RoxygenRoxygen guide guide at at Filter rows by value Sort by Search RoxygenRoxygen guide guide at at ororFilter value value rows range range by value valuesvaluesSort by forforSearch value value HelpHelp > >Roxygen Roxygen Quick Quick Reference Reference or value range values for value HelpHelp > >Roxygen Roxygen Quick Quick Reference Reference or value range values for value

RStudio®RStudio® is isa trademarka trademark of of RStudio, RStudio, Inc. Inc. • CC• CC BY BY SA SA RStudio RStudio • info@rstudio.com• [email protected] • 844-448-1212• 844-448-1212 • rstudio.com • rstudio.com • Learn• Learn more more at at www.rstudio.com www.rstudio.com • RStudio• RStudio IDE IDE 0.99.832 0.99.832 • Updated:• Updated: 2016-01 2016-01 RStudio®RStudio® is isa trademarka trademark of of RStudio, RStudio, Inc. Inc. • CC• CC BY BY SA SA RStudio RStudio • [email protected]• [email protected] • 844-448-1212• 844-448-1212 • rstudio.com • rstudio.com • Learn• Learn more more at at www.rstudio.com www.rstudio.com • RStudio• RStudio IDE IDE 0.99.832 0.99.832 • Updated:• Updated: 2016-01 2016-01 1 LAYOUT Windows/Linux Mac 4 WRITE CODE Windows /Linux Mac WHY RSTUDIO SERVER PRO? Move focus to Source Editor Ctrl+1 Ctrl+1 Attempt completion Tab or Ctrl+Space Tab or Cmd+Space RSP extends the the open source server with a Move focus to Console Ctrl+2 Ctrl+2 Navigate candidates / / commercial license, support, and more: Move focus to Help Ctrl+3 Ctrl+3 Accept candidate Enter, Tab, or Enter, Tab, or • open and run multiple R sessions at once Show History Ctrl+4 Ctrl+4 Dismiss candidates Esc Esc • tune your resources to improve performance Show Files Ctrl+5 Ctrl+5 Undo Ctrl+Z Cmd+Z • edit the same project at the same time as others Show Plots Ctrl+6 Ctrl+6 Redo Ctrl+Shi +Z Cmd+Shi +Z • see what you and others are doing on your server Show Packages Ctrl+7 Ctrl+7 Cut Ctrl+X Cmd+X • switch easily from one version of R to a di erent version Copy Ctrl+C Cmd+C Show Environment Ctrl+8 Ctrl+8 integrate with your authentication, authorization, and audit practices Paste Ctrl+V Cmd+V • Show Git/SVN Ctrl+9 Ctrl+9 Select All Ctrl+A Cmd+A Download a free 45 day evaluation at Show Build Ctrl+0 Ctrl+0 Delete Line Ctrl+D Cmd+D www.rstudio.com/products/rstudio-server-pro/ Select Shi +[Arrow] Shi +[Arrow] 2 RUN CODE Windows/Linux Mac Select Word Ctrl+Shi + / Option+Shi + / Search command history Ctrl+ Cmd+ Select to Line Start Alt+Shi + Cmd+Shi + 5 DEBUG CODE Windows/Linux Mac Select to Line End Alt+Shi + Cmd+Shi + Toggle Breakpoint Shi +F9 Shi +F9 Navigate command history / / Move cursor to start of line Home Cmd+ Select Page Up/Down Shi +PageUp/Down Shi +PageUp/Down Execute Next Line F10 F10 Step Into Function Shi +F4 Shi +F4 Move cursor to end of line End Cmd+ Select to Start/End Shi +Alt+ / Cmd+Shi + / Finish Function/Loop Shi Change working directory Ctrl+Shi +H Ctrl+Shi +H Delete Word Le Ctrl+Backspace Ctrl+Opt+Backspace +F6 Shi +F6 Continue Shi +F5 Shi +F5 Interrupt current command Esc Esc Delete Word Right Option+Delete Stop Debugging Shi +F8 Shi +F8 Clear console Ctrl+L Ctrl+L Delete to Line End Ctrl+K Quit Session (desktop only) Ctrl+Q Cmd+Q Delete to Line Start Option+Backspace Restart R Session Ctrl+Shi +F10 Cmd+Shi +F10 Indent Tab (at start of line) Tab (at start of line) 6 VERSION CONTROL Windows/Linux Mac Run current line/selection Ctrl+Enter Cmd+Enter Outdent Shi +Tab Shi +Tab Show di Ctrl+Alt+D Ctrl+Option+D Run current (retain cursor) Alt+Enter Option+Enter Yank line up to cursor Ctrl+U Ctrl+U Commit changes Ctrl+Alt+M Ctrl+Option+M Yank line a er cursor Ctrl+K Ctrl+K Run from current to end Ctrl+Alt+E Cmd+Option+E Scroll di view Ctrl+ / Ctrl+ / Run the current function Ctrl+Alt+F Cmd+Option+F Insert yanked text Ctrl+Y Ctrl+Y Stage/Unstage (Git) Spacebar Spacebar Insert <- Alt+- Option+- Source a file Ctrl+Alt+G Cmd+Option+G Stage/Unstage and move to next Enter Enter Insert %>% Ctrl+Shi +M Cmd+Shi +M Source the current file Ctrl+Shi +S Cmd+Shi +S Show help for function F1 F1 Source with echo Ctrl+Shi +Enter Cmd+Shi +Enter Show source code F2 F2 7 MAKE PACKAGES Windows/Linux Mac New document Ctrl+Shi +N Cmd+Shi +N Build and Reload Ctrl+Shi +B Cmd+Shi +B 3 NAVIGATE CODE Windows /Linux Mac New document (Chrome) Ctrl+Alt+Shi +N Cmd+Shi +Opt+N Load All (devtools) Ctrl+Shi +L Cmd+Shi +L Goto File/Function Ctrl+. Ctrl+. Open document Ctrl+O Cmd+O Test Package (Desktop) Ctrl+Shi +T Cmd+Shi +T Fold Selected Alt+L Cmd+Option+L Save document Ctrl+S Cmd+S Test Package (Web) Ctrl+Alt+F7 Cmd+Opt+F7 Unfold Selected Shi +Alt+L Cmd+Shi +Option+L Close document Ctrl+W Cmd+W Check Package Ctrl+Shi +E Cmd+Shi +E Fold All Alt+O Cmd+Option+O Close document (Chrome) Ctrl+Alt+W Cmd+Option+W Document Package Ctrl+Shi +D Cmd+Shi +D Unfold All Shi +Alt+O Cmd+Shi +Option+O Close all documents Ctrl+Shi +W Cmd+Shi +W Go to line Shi +Alt+G Cmd+Shi +Option+G Extract function Ctrl+Alt+X Cmd+Option+X 8 DOCUMENTS AND APPS Windows/Linux Mac Jump to Shi +Alt+J Cmd+Shi +Option+J Extract variable Ctrl+Alt+V Cmd+Option+V Preview HTML (Markdown, etc.) Ctrl+Shi +K Cmd+Shi +K Switch to tab Ctrl+Shi +. Ctrl+Shi +. Reindent lines Ctrl+I Cmd+I Knit Document (knitr) Ctrl+Shi +K Cmd+Shi +K Previous tab Ctrl+F11 Ctrl+F11 (Un)Comment lines Ctrl+Shi +C Cmd+Shi +C Compile Notebook Ctrl+Shi +K Cmd+Shi +K Next tab Ctrl+F12 Ctrl+F12 Reflow Comment Ctrl+Shi +/ Cmd+Shi +/ Compile PDF (TeX and Sweave) Ctrl+Shi +K Cmd+Shi +K First tab Ctrl+Shi +F11 Ctrl+Shi +F11 Reformat Selection Ctrl+Shi +A Cmd+Shi +A Insert chunk (Sweave and Knitr) Ctrl+Alt+I Cmd+Option+I Last tab Ctrl+Shi +F12 Ctrl+Shi +F12 Select within braces Ctrl+Shi +E Ctrl+Shi +E Insert code section Ctrl+Shi +R Cmd+Shi +R Navigate back Ctrl+F9 Cmd+F9 Show Diagnostics Ctrl+Shi +Alt+P Cmd+Shi +Opt+P Re-run previous region Ctrl+Shi +P Cmd+Shi +P Navigate forward Ctrl+F10 Cmd+F10 Transpose Letters Ctrl+T Run current document Ctrl+Alt+R Cmd+Option+R Jump to Brace Ctrl+P Ctrl+P Move Lines Up/Down Alt+ / Option+ / Run from start to current line Ctrl+Alt+B Cmd+Option+B Select within Braces Ctrl+Shi +Alt+E Ctrl+Shi +Option+E Copy Lines Up/Down Shi +Alt+ / Cmd+Option+ / Run the current code section Ctrl+Alt+T Cmd+Option+T Use Selection for Find Ctrl+F3 Cmd+E Add New Cursor Above Ctrl+Alt+Up Ctrl+Option+Up Run previous Sweave/Rmd code Ctrl+Alt+P Cmd+Option+P Find in Files Ctrl+Shi +F Cmd+Shi +F Add New Cursor Below Ctrl+Alt+Down Ctrl+Option+Down Run the current chunk Ctrl+Alt+C Cmd+Option+C Find Next Win: F3, Linux: Ctrl+G Cmd+G Move Active Cursor Up Ctrl+Alt+Shi +Up Ctrl+Option+Shi +Up Run the next chunk Ctrl+Alt+N Cmd+Option+N Find Previous W: Shi +F3, L: Cmd+Shi +G Move Active Cursor Down Ctrl+Alt+Shi +Down Ctrl+Opt+Shi +Down Sync Editor & PDF Preview Ctrl+F8 Cmd+F8 Jump to Word Ctrl+ / Option+ / Find and Replace Ctrl+F Cmd+F Previous plot Ctrl+Alt+F11 Cmd+Option+F11 Use Selection for Find Ctrl+F3 Cmd+E Next plot Ctrl+Alt+F12 Cmd+Option+F12 Jump to Start/End Ctrl+ / Cmd+ / Toggle Outline Ctrl+Shi +O Cmd+Shi +O Replace and Find Ctrl+Shi +J Cmd+Shi +J Show Keyboard Shortcuts Alt+Shi +K Option+Shi +K

Learn more at www.rstudio.com • RStudio IDE 0.1.0 • Updated: 2017-09 Shiny : : CHEAT SHEET Complete the template by adding arguments to fluidPage() and a Basics Building an App body to the server function. Inputs A Shiny app is a web page (UI) connected to a Add inputs to the UI with *Input() functions library(shiny) collect values from the user computer running a live R session (Server) ui <- fluidPage( Access the current value of an input object with Add outputs with *Output() functions numericInput(inputId = "n", "Sample size", value = 25), input$. Input values are reactive. Tell server how to render outputs with R in plotOutput(outputId = "hist") the server function. To do this: ) actionButton(inputId, label, icon, …) 1. Refer to outputs with output$ server <- function(input, output) { output$hist <- renderPlot({ 2. Refer to inputs with input$ hist(rnorm(input$n)) Users can manipulate the UI, which will cause }) actionLink(inputId, label, icon, …) the server to update the UI’s displays (by 3. Wrap code in a render*() function before } saving to output running R code). shinyApp(ui = ui, server = server) checkboxGroupInput(inputId, label, Save your template as app.R. Alternatively, split your template into two files named ui.R and server.R. choices, selected, inline)

APP TEMPLATE library(shiny) # ui.R ui.R contains everything checkboxInput(inputId, label, Begin writing a new app with this template. ui <- fluidPage( fluidPage( you would save to ui. numericInput(inputId = "n", numericInput(inputId = "n", value) Preview the app by running the code at the R "Sample size", value = 25), "Sample size", value = 25), command line. plotOutput(outputId = "hist") plotOutput(outputId = "hist") ) ) server.R ends with the dateInput(inputId, label, value, min, function you would save max, format, startview, weekstart, server <- function(input, output) { # server.R library(shiny) output$hist <- renderPlot({ to server. language) ui <- fluidPage() hist(rnorm(input$n)) function(input, output) { }) output$hist <- renderPlot({ server <- function(input, output){} } hist(rnorm(input$n)) No need to call }) dateRangeInput(inputId, label, shinyApp(ui = ui, server = server) shinyApp(ui = ui, server = server) } shinyApp(). start, end, min, max, format, Save each app as a directory that holds an app.R file (or a server.R file and a ui.R file) plus optional extra files. startview, weekstart, language, separator) • ui - nested R functions that assemble an The directory name is the name of the app HTML user interface for your app app-name .r app.R (optional) defines objects available to both Launch apps with fileInput(inputId, label, multiple, server - a function with instructions on how global.R ui.R and server.R • runApp() in the UI README (optional) data, scripts, etc. (optional) directory of files to share with web numericInput(inputId, label, value, min, max, step) • shinyApp - combines ui and server into www browsers (images, CSS, .js, etc.) Must be named "www" an app. Wrap with runApp() if calling from a sourced script or inside a function. passwordInput(inputId, label, Outputs - render*() and *Output() functions work together to add R output to the UI value) SHARE YOUR APP works with DT::renderDataTable(expr, options, dataTableOutput(outputId, icon, …) radioButtons(inputId, label, The easiest way to share your app choices, selected, inline) is to host it on shinyapps.io, a callback, escape, env, quoted) cloud based service from RStudio renderImage(expr, env, quoted, imageOutput(outputId, width, height, deleteFile) click, dblclick, hover, hoverDelay, inline, selectInput(inputId, label, choices, hoverDelayType, brush, clickId, hoverId) selected, multiple, selectize, width, 1. Create a free or professional account at size) (also selectizeInput()) http://shinyapps.io renderPlot(expr, width, height, res, …, plotOutput(outputId, width, height, click, env, quoted, func) dblclick, hover, hoverDelay, inline, sliderInput(inputId, label, min, max, 2. Click the Publish icon in the RStudio IDE hoverDelayType, brush, clickId, hoverId) value, step, round, format, locale, or run: ticks, animate, width, sep, pre, renderPrint(expr, env, quoted, func, verbatimTextOutput(outputId) post) rsconnect::deployApp("") width) renderTable(expr,…, env, quoted, func) tableOutput(outputId) Build or purchase your own Shiny Server submitButton(text, icon) at www.rstudio.com/products/shiny-server/ renderText(expr, env, quoted, func) textOutput(outputId, container, inline) (Prevents reactions across entire app) uiOutput(outputId, inline, container, …) renderUI(expr, env, quoted, func) & htmlOutput(outputId, inline, container, …) textInput(inputId, label, value)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at shiny.rstudio.com • shiny 0.12.0 • Updated: 2016-01 Reactivity UI - An app’s UI is an HTML document. Layouts Reactive values work together with reactive functions. Call a reactive value from within the arguments of one Use Shiny’s functions to assemble this HTML with R. Combine multiple elements of these functions to avoid the error Operation not allowed without an active reactive context. fluidPage( into a "single element" that textInput("a","") Returns has its own properties with ) HTML a panel function, e.g. ##

wellPanel(dateInput("a", ""), ## submitButton() ## absolutePanel() navlistPanel() ##

conditionalPanel() sidebarPanel() ##

fixedPanel() tabPanel() headerPanel() tabsetPanel() Add static HTML elements with tags, a list of inputPanel() titlePanel() functions that parallel common HTML tags, e.g. mainPanel() wellPanel() tags$a(). Unnamed arguments will be passed into the tag; named arguments will become tag Organize panels and elements into a layout with a attributes. layout function. Add elements as arguments of the layout functions. tags$a tags$data tags$h6 tags$nav tags$span fluidRow() tags$abbr tags$datalist tags$head tags$ noscript tags$ strong tags$ tags$ tags$ tags$ tags$ ui <- fluidPage( address dd header object style column col tags$area tags$del tags$hgroup tags$ ol tags$sub row fluidRow(column(width = 4), tags$article tags$details tags$hr tags$optgroup tags$ summary column(width = 2, oﬀset = 3)), tags$aside tags$dfn tags$HTML tags$option tags$sup column fluidRow(column(width = 12)) tags$audio tags$div tags$i tags$output tags$table ) CREATE YOUR OWN REACTIVE VALUES RENDER REACTIVE OUTPUT tags$b tags$dl tags$iframe tags$ p tags$tbody tags$base tags$dt tags$img tags$param tags$td # example snippets *Input() functions library(shiny) render*() functions tags$bdi tags$em tags$input tags$ pre tags$textarea flowLayout() (see front page) tags$bdo tags$embed tags$ins tags$progress tags$ tfoot ui <- fluidPage( ui <- fluidPage( (see front page) object object textInput("a","","A"), tags$blockquote tags$eventsource tags$kbd tags$q tags$th object flowLayout( # object 1, ui <- fluidPage( tags$body tags$fieldset tags$keygen tags$ruby tags$thead 1 2 3 textInput("a","","A") reactiveValues(…) textOutput("b") # object 2, ) ) Builds an object to tags$br tags$figcaption tags$label tags$rp tags$time # object 3 tags$button tags$figure tags$legend tags$ rt tags$title object 3 ) server <- display. Will rerun code in tags$canvas tags$footer tags$li tags$s tags$tr Each input function function(input,output){ ) server <- creates a reactive value body to rebuild the object tags$caption tags$form tags$link tags$samp tags$track function(input,output){ output$b <- tags$cite tags$h1 tags$mark tags$ script tags$u renderText({ whenever a reactive value sidebarLayout() rv <- reactiveValues() stored as input$ tags$code tags$h2 tags$map tags$ section tags$ul ui <- fluidPage( rv$number <- 5 input$a in the code changes. }) tags$col tags$h3 tags$menu tags$select tags$var sidebarLayout( } reactiveValues() creates a tags$colgroup tags$h4 tags$meta tags$small tags$video } Save the results to sidebarPanel(), list of reactive values tags$command tags$h5 tags$meter tags$ source tags$wbr side main mainPanel() shinyApp(ui, server) output$ panel whose values you can set. The most common tags have wrapper functions. You panel ) ) do not need to prefix their names with tags$ PREVENT REACTIONS TRIGGER ARBITRARY CODE ui <- fluidPage( splitLayout() library(shiny) isolate(expr) library(shiny) observeEvent(eventExpr h1("Header 1"), ui <- fluidPage( hr(), ui <- fluidPage( ui <- fluidPage( , handlerExpr, event.env, splitLayout( # object 1, textInput("a","","A"), Runs a code block. textInput("a","","A"), event.quoted, handler.env, br(), object object # object 2 textOutput("b") Returns a non-reactive actionButton("go","Go") p(strong("bold")), ) handler.quoted, labe, 1 2 ) ) copy of the results. p(em("italic")), ) suspended, priority, domain, p(code("code")), server <- server <- autoDestroy, ignoreNULL) function(input,output){ function(input,output){ a(href="", "link"), output$b <- observeEvent(input$go,{ HTML("

Raw html

") verticalLayout() ui <- fluidPage( renderText({ print(input$a) Runs code in 2nd ) verticalLayout( # object 1, isolate({input$a}) }) argument when reactive object 1 # object 2, }) } # object 3 } values in 1st argument object 2 change. See observe() for To include a CSS file, use includeCSS(), or ) shinyApp(ui, server) shinyApp(ui, server) alternative. 1. Place the file in the www subdirectory object 3 ) 2. Link to it with Layer tabPanels on top of each other, MODULARIZE REACTIONS DELAY REACTIONS tags$head(tags$link(rel = "stylesheet", and navigate between them, with: type = "text/css", href = "")) ui <- fluidPage( library(shiny) textInput("a","","A"), reactive(x, env, quoted, eventReactive(eventExpr, ui <- fluidPage( tabsetPanel( textInput("z","","Z"), label, domain) ui <- fluidPage( valueExpr, event.env, tabPanel("tab 1", "contents"), textOutput("b")) textInput("a","","A"), tabPanel("tab 2", "contents"), Creates a reactive expression actionButton("go","Go"), event.quoted, value.env, To include JavaScript, use includeScript() or server <- that textOutput("b") tabPanel("tab 3", "contents"))) function(input,output){ ) value.quoted, label, 1. Place the file in the www subdirectory re <- reactive({ • caches its value to reduce domain, ignoreNULL) 2. Link to it with ui <- fluidPage( navlistPanel( computation server <- paste(input$a,input$z)}) function(input,output){ Creates reactive tabPanel("tab 1", "contents"), output$b <- renderText({ re <- eventReactive( tabPanel("tab 2", "contents"), re() • can be called by other code input$go,{input$a}) expression with code in tags$head(tags$script(src = "")) }) output$b <- renderText({ tabPanel("tab 3", "contents"))) } • notifies its dependencies re() 2nd argument that only shinyApp(ui, server) when it ha been invalidated }) invalidates when reactive IMAGES } To include an image ui <- navbarPage(title = "Page", Call the expression with values in 1st argument 1. Place the file in the www subdirectory tabPanel("tab 1", "contents"), function syntax, e.g. re() shinyApp(ui, server) change. tabPanel("tab 2", "contents"), 2. Link to it with img(src="") tabPanel("tab 3", "contents"))

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at shiny.rstudio.com • shiny 0.12.0 • Updated: 2016-01 File path to output document R Markdown : : CHEAT SHEET 5 1 Find in document synch publish rmarkdown What is R Markdown? button to .rmd Structure accounts at YAML Header .Rmd files · An R Markdown 4 rpubs.com, Optional section of render (e.g. pandoc) 3 options written as key:value pairs (YAML). (.Rmd) file is a record of your shinyapps.io research. It contains the code that a At start of file scientist needs to reproduce your work set insert run code RStudio Connect Between lines of - - - Rmd along with the narration that a reader preview code chunk(s) go to needs to understand your work. location chunk Text 2 code Reload document Narration formatted with markdown, mixed with: Reproducible Research · At the click of a chunk publish button, or the type of a command, you Code Chunks show Chunks of embedded code. Each chunk: can rerun the code in an R Markdown file outline to reproduce your work and export the Begins with ```{r} results as a finished report. ends with ``` Dynamic Documents · You can choose R Markdown will run the code and append the results to the doc. to export the finished report in a variety It will use the location of the .Rmd file as the working directory of formats, including html, pdf, MS run all Word, or RTF documents; html or pdf previous based slides, Notebooks, and more. chunks modify run chunk current options chunk Parameters Parameterize your documents to reuse with Workflow diﬀerent inputs (e.g., data, values, etc.) --- 1. Add parameters · Create and set params: parameters in the header as sub- n: 100 values of params d: !r Sys.Date() --- 2. Call parameters · Call parameter values in code as params$ 6 Today’s date 3. Set parameters · Set values wth is `r params$d` Knit with parameters or the params 1 Open a new .Rmd file at File ▶ New File ▶ R Markdown. Use the wizard that opens to pre- argument of render(): populate the file with a template 7 render("doc.Rmd", params = list(n = 1, 2 Write document by editing template d = as.Date("2015-01-01")) 3 Knit document to create report; use knit button or render() to knit 4 Preview Output in IDE window render Interactive 5 Use rmarkdown::render() to render/knit at cmd line. Important args: Publish (optional) to web server Documents Examine build log in R Markdown console input - file to render output_options - output_file params - list of envir - environment encoding - of input 6 List of render params to use to evaluate code file output_format output_dir Turn your report into an interactive Shiny 7 Use output file that is saved along side .Rmd options (as in YAML) chunks in document in 4 steps 1. Add runtime: shiny to the YAML header. 2. Call Shiny input functions to embed input objects. Embed code with knitr syntax 3. Call Shiny render functions to embed reactive output. 4. Render with rmarkdown::run or click Run Document in INLINE CODE CODE CHUNKS GLOBAL OPTIONS RStudio IDE Insert with `r `. Results appear as text without code. One or more lines surrounded with ```{r} and ```. Place chunk Set with knitr::opts_chunk$set(), e.g. Built with `r getRversion()` Built with 3.2.3 options within curly braces, after r. Insert with ```{r include=FALSE} ```{r echo=TRUE} knitr::opts_chunk$set(echo = TRUE) --- getRversion() ``` output: html_document ``` runtime: shiny --- IMPORTANT CHUNK OPTIONS ```{r, echo = FALSE} cache - cache results for future knits (default = dependson - chunk dependencies for caching fig.align - 'left', 'right', or 'center' (default = message - display code messages in numericInput("n", FALSE) (default = NULL) 'default') document (default = TRUE) "How many cars?", 5) cache.path - directory to save cached results in echo - Display code in output document (default = fig.cap - figure caption as character string (default results (default = 'markup') (default = "cache/") TRUE) = NULL) 'asis' - passthrough results renderTable({ head(cars, input$n) child - file(s) to knit and then include (default = engine - code language used in chunk (default = fig.height, fig.width - Dimensions of plots in 'hide' - do not display results 'hold' - put all results below all code }) NULL) 'R') inches ``` collapse - collapse all output into single block error - Display error messages in doc (TRUE) or highlight - highlight source code (default = TRUE) tidy - tidy code for display (default = FALSE) (default = FALSE) stop render when errors occur (FALSE) (default = include - Include chunk in doc after running warning - display code warnings in document FALSE) (default = TRUE) comment - prefix for each line of results (default = '##') (default = TRUE) Embed a complete app into your document with eval - Run code in chunk (default = TRUE) shiny::shinyAppDir() Options not listed above: R.options, aniopts, autodep, background, cache.comments, cache.lazy, cache.rebuild, cache.vars, dev, dev.args, dpi, NOTE: Your report will rendered as a Shiny app, which means engine.opts, engine.path, fig.asp, fig.env, fig.ext, fig.keep, fig.lp, fig.path, fig.pos, fig.process, fig.retina, fig.scap, fig.show, fig.showtext, fig.subcap, interval, you must choose an html output format, like html_document, out.extra, out.height, out.width, prompt, purl, ref.label, render, size, split, tidy.opts and serve it with an active R Session.

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at rmarkdown.rstudio.com • rmarkdown 1.6 • Updated: 2016-02 Pandoc’s Markdown Set render options with YAML Write with syntax on the left to create eﬀect on right (after render) When you render, R Markdown rmarkdown Plain text 1. runs the R code, embeds results and text into .md file with knitr End a line with two spaces to start a new paragraph. 2. then converts the .md file into the finished format with pandoc *italics* and **bold** `verbatim code` sub/superscript^2^~2~ sub-option description ~~strikethrough~~ html pdf word odt rtf md gituhb ioslides slidy beamer escaped: \* \_ \\ citation_package The LaTeX package to process citations, natbib, biblatex or none X X X endash: --, emdash: --- equation: $A = \pi*r^{2}$ Set a document’s code_folding Let readers to toggle the display of R code, "none", "hide", or "show" X default output format --- equation block: output: html_document colortheme Beamer color theme to use X in the YAML header: --- $$E = mc^{2}$$ # Body css CSS file to use to style document X X X > block quote dev Graphics device to use for figure output (e.g. "png") X X X X X X X duration Add a countdown timer (in minutes) to footer of slides X # Header1 {#anchor} output value creates fig_caption Should figures be rendered with captions? X X X X X X X ## Header 2 {#css_id} html_document html fig_height, fig_width Default figure height and width (in inches) for document X X X X X X X X X X ### Header 3 {.css_class} pdf_document pdf (requires Tex ) word_document Microsoft Word (.docx) highlight Syntax highlighting: "tango", "pygments", "kate","zenburn", "textmate" X X X X X #### Header 4 odt_document OpenDocument Text includes File of content to place in document (in_header, before_body, after_body) X X X X X X X X ##### Header 5 rtf_document Rich Text Format incremental Should bullets appear one at a time (on presenter mouse clicks)? X X X ###### Header 6 md_document Markdown keep_md Save a copy of .md file that contains knitr output X X X X X X

github_document Github compatible markdown keep_tex Save a copy of .tex file that contains knitr output X X ioslides_presentation ioslides HTML slides latex_engine Engine to render latex, "pdflatex", "xelatex", or "lualatex" X X \textbf{Tex ignored in HTML} HTML ignored in pdfs slidy_presentation slidy HTML slides lib_dir Directory of dependency files to use (Bootstrap, MathJax, etc.) X X X beamer_presentation Beamer pdf slides (requires Tex) mathjax Set to local or a URL to use a local/URL version of MathJax to render equations X X X [link](www.rstudio.com) Jump to [Header 1](#anchor) Indent 2 Indent 4 md_extensions Markdown extensions to add to default definition or R Markdown X X X X X X X X X X spaces spaces image: Customize output with --- number_sections Add section numbering to headers X X sub-options (listed to output: html_document: ![Caption](smallorb.png) the right): code_folding: hide pandoc_args Additional arguments to pass to Pandoc X X X X X X X X X X * unordered list toc_float: TRUE preserve_yaml Preserve YAML front matter in final document? X + sub-item 1 --- + sub-item 2 # Body reference_docx docx file whose styles should be copied when producing docx output X - sub-sub-item 1 self_contained Embed dependencies into the doc X X X

* item 2 html tabsets slide_level The lowest heading level that defines individual slides X Continued (indent 4 spaces) Use tablet css class to place sub-headers into tabs smaller Use the smaller font size in the presentation? X

1. ordered list # Tabset {.tabset .tabset-fade .tabset-pills} smart Convert straight quotes to curly, dashes to em-dashes, … to ellipses, etc. X X X 2. item 2 ## Tab 1 template Pandoc template to use when rendering file quarterly_report.html). X X X X X i) sub-item 1 A. sub-sub-item 1 text 1 Tabset theme Bootswatch or Beamer theme to use for page X X

(@) A list whose numbering ## Tab 2 Tab 1 Tab 2 toc Add a table of contents at start of document X X X X X X X text 2 continues after text 1 toc_depth The lowest level of headings to add to table of contents X X X X X X ### End tabset End tabset toc_float Float the table of contents to the left of the main content X (@) an interruption

Term 1 : Definition 1 Create a Reusable Template Table Suggestions Citations and Bibliographies | Right | Left | Default | Center | 1. Create a new package with a inst/rmarkdown/templates Several functions format R data into tables Create citations with .bib, .bibtex, .copac, .enl, .json, |------:|:-----|------|:------:| directory .medline, .mods, .ris, .wos, and .xml files | 12 | 12 | 12 | 12 | | 123 | 123 | 123 | 123 | 2. In the directory, Place a folder that contains: --- | 1 | 1 | 1 | 1 | template.yaml (see below) 1. Set bibliography file and CSL 1.0 bibliography: refs.bib skeleton.Rmd (contents of the template) Style file (optional) in the YAML header csl: style.csl - slide bullet 1 any supporting files - slide bullet 2 2. Use citation keys in text --- 3. Install the package (>- to have bullets appear on click) 4. Access template in wizard at File ▶ New File ▶ R Markdown data <- faithful[1:4, ] Smith cited [@smith04]. horizontal rule/slide break: template.yaml ```{r results = 'asis'} Smith cited without author [-@smith04]. knitr::kable(data, caption = "Table with kable”) @smith04 cited in line. *** --- ``` A footnote [^1] name: My Template ```{r results = "asis"} — print(xtable::xtable(data, caption = "Table with xtable”), 3. Render. Bibliography will be [^1]: Here is the footnote. type = "html", html.table.attributes = "border=0")) added to end of document ``` Learn more in ```{r results = "asis"} the stargazer, stargazer::stargazer(data, type = "html", title = "Table xtable, and knitr with stargazer") packages. ``` RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at rmarkdown.rstudio.com • rmarkdown 1.6 • Updated: 2016-02 Data Import : : CHEAT SHEET

R’s tidyverse is built around tidy data stored - These functions share the common arguments: in tibbles, which are enhanced data frames. Read Tabular Data Data types read_*(file, col_names = TRUE, col_types = NULL, locale = default_locale(), na = c("", "NA"), readr functions guess The front side of this sheet shows the types of each column and how to read text files into R with quoted_na = TRUE, comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, guess_max = min(1000, n_max), progress = interactive()) convert types when appropriate (but will NOT readr. convert strings to factors automatically). Comma Delimited Files The reverse side shows how to A B C a,b,c read_csv("file.csv") A message shows the type of each column in the create tibbles with tibble and to 1 2 3 result. 1,2,3 To make file.csv run: layout tidy data with tidyr. 4 5 NA 4,5,NA write_file(x = "a,b,c\n1,2,3\n4,5,NA", path = "file.csv") ## Parsed with column specification: ## cols( OTHER TYPES OF DATA A B C Semi-colon Delimited Files age is an a;b;c ## age = col_integer(), Try one of the following packages to import 1 2 3 read_csv2("file2.csv") ## sex = col_character(), integer other types of files 1;2;3 4 5 NA write_file(x = "a;b;c\n1;2;3\n4;5;NA", path = "file2.csv") ## earn = col_double() • haven - SPSS, Stata, and SAS files 4;5;NA ## ) • readxl - excel files (.xls and .xlsx) Files with Any Delimiter sex is a A B C read_delim("file.txt", delim = "|") earn is a double (numeric) character • DBI - databases a|b|c 1 2 3 write_file(x = "a|b|c\n1|2|3\n4|5|NA", path = "file.txt") • jsonlite - json 1|2|3 4 5 NA 1. Use problems() to diagnose problems. • xml2 - XML 4|5|NA Fixed Width Files x <- read_csv("file.csv"); problems(x) • httr - Web APIs read_fwf("file.fwf", col_positions = c(1, 3, 5)) A B C • rvest - HTML (Web Scraping) a b c write_file(x = "a b c\n1 2 3\n4 5 NA", path = "file.fwf") 1 2 3 2. Use a col_ function to guide parsing. 1 2 3 - the default 4 5 NA Tab Delimited Files • col_guess() 4 5 NA Save Data read_tsv("file.tsv") Also read_table(). • col_character() write_file(x = "a\tb\tc\n1\t2\t3\n4\t5\tNA", path = "file.tsv") • col_double(), col_euro_double() Save x, an R object, to path, a file path, as: • col_datetime(format = "") Also USEFUL ARGUMENTS format = "" , format = "" Comma delimited file col_date( ) col_time( ) • col_factor(levels, ordered = FALSE) write_csv(x, path, na = "NA", append = FALSE, a,b,c Example file 1 2 3 Skip lines col_names = !append) write_file("a,b,c\n1,2,3\n4,5,NA","file.csv") read_csv(f, skip = 1) • col_integer() 1,2,3 4 5 NA f <- "file.csv" File with arbitrary delimiter 4,5,NA • col_logical() write_delim(x, path, delim = " ", na = "NA", • col_number(), col_numeric() append = FALSE, col_names = !append) A B C No header A B C Read in a subset • col_skip() 1 2 3 CSV for excel read_csv(f, col_names = FALSE) 1 2 3 read_csv(f, n_max = 1) x <- read_csv("file.csv", col_types = cols( write_excel_csv(x, path, na = "NA", append = 4 5 NA A = col_double(), B = col_logical(), FALSE, col_names = !append) x y z Provide header Missing Values C = col_factor())) String to file A B C read_csv(f, col_names = c("x", "y", "z")) A B C write_file(x, path, append = FALSE) 1 2 3 NA 2 3 read_csv(f, na = c("1", ".")) 4 5 NA 4 5 NA 3. Else, read in as character vectors then parse String vector to file, one element per line with a parse_ function. write_lines(x,path, na = "NA", append = FALSE) • parse_guess() Object to RDS file Read Non-Tabular Data • parse_character() Also and write_rds(x, path, compress = c("none", "gz", Read a file into a raw vector • parse_datetime() parse_date() "bz2", "xz"), ...) Read a file into a single string parse_time() read_file(file, locale = default_locale()) read_file_raw(file) Tab delimited files • parse_double() Read each line into a raw vector write_tsv(x, path, na = "NA", append = FALSE, Read each line into its own string • parse_factor() col_names = !append) read_lines(file, skip = 0, n_max = -1L, na = character(), read_lines_raw(file, skip = 0, n_max = -1L, • parse_integer() locale = default_locale(), progress = interactive()) progress = interactive()) • parse_logical() Read Apache style log files • parse_number() read_log(file, col_names = FALSE, col_types = NULL, skip = 0, n_max = -1, progress = interactive()) x$A <- parse_number(x$A)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at tidyverse.org • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01 Tibbles - an enhanced data frame Tidy Data with tidyr Split Cells Tidy data is a way to organize tabular data. It provides a consistent data structure across packages. The package provides a new Use these functions to tibble A table is tidy if: Tidy data: S3 class for storing tabular data, the A * B -> C split or combine cells tibble. Tibbles inherit the data frame A B C A B C A B C A * B C into individual, isolated class, but improve three behaviors: values. • Subsetting - [ always returns a new tibble, & [[ and $ always return a vector. separate(data, col, into, sep = "[^[:alnum:]] Each variable is in Each observation, or Makes variables easy Preserves cases during +", remove = TRUE, convert = FALSE, • No partial matching - You must use full its own column case, is in its own row to access as vectors vectorized operations extra = "warn", fill = "warn", ...) column names when subsetting Separate each cell in a column to make • Display - When you print a tibble, R provides a - change the layout of values in a table several columns. concise view of the Reshape Data Use gather() and spread() to reorganize the values of a table into a new layout. table3 data that fits on # A tibble: 234 × 6 manufacturer model displ country year rate country year cases pop one screen 1 audi a4 1.8 A 1999 0.7K/19M A 1999 0.7K 19M 2 audi a4 1.8 gather(data, key, value, ..., na.rm = FALSE, spread(data, key, value, fill = NA, convert = FALSE, 3 audi a4 2.0 A 2000 2K/20M A 2000 2K 20M 4 audi a4 2.0 5 audi a4 2.8 convert = FALSE, factor_key = FALSE drop = TRUE, sep = NULL 6 audi a4 2.8 ) ) B 1999 37K/172M B 1999 37K 172 7 audi a4 3.1 8 audi a4 quattro 1.8 B 2000 80K/174M B 2000 80K 174 9 audi a4 quattro 1.8 gather() moves column names into a key spread() moves the unique values of a key 10 audi a4 quattro 2.0 C 1999 212K/1T C 1999 212K 1T # ... with 224 more rows, and 3 # more variables: year , column, gathering the column values into a column into the column names, spreading the # cyl , trans C 2000 213K/1T C 2000 213K 1T single value column. values of a value column across the new columns. ww separate(table3, rate, tibble display table4a table2 into = c("cases", "pop")) country 1999 2000 country year cases country year type count country year cases pop 156 1999 6 auto(l4) 157 1999 6 auto(l4) A 0.7K 2K A 1999 0.7K A 1999 cases 0.7K A 1999 0.7K 19M 158 2008 6 auto(l4) 159 2008 8 auto(s4) B 37K 80K B 1999 37K A 1999 pop 19M A 2000 2K 20M separate_rows(data, ..., sep = "[^[:alnum:].] 160 1999 4 manual(m5) 161 1999 4 auto(l4) 162 2008 4 manual(m5) C 212K 213K C 1999 212K A 2000 cases 2K B 1999 37K 172M 163 2008 4 manual(m5) +", convert = FALSE) 164 2008 4 auto(l4) A 2000 2K A 2000 pop 20M B 2000 80K 174M 165 2008 4 auto(l4) 166 1999 4 auto(l4) B 2000 80K B 1999 cases 37K C 1999 212K 1T Separate each cell in a column to make [ reached getOption("max.print") A large table -- omitted 68 rows ] C 2000 213K B 1999 pop 172M C 2000 213K 1T several rows. Also separate_rows_(). key value B 2000 cases 80K to display table3 data frame display B 2000 pop 174M • Control the default appearance with options: C 1999 cases 212K country year rate country year rate C 1999 pop 1T A 1999 0.7K/19M A 1999 0.7K options(tibble.print_max = n, C 2000 cases 213K A 2000 2K/20M A 1999 19M tibble.print_min = m, tibble.width = Inf) C 2000 pop 1T B 1999 37K/172M A 2000 2K gather(table4a, `1999`, `2000`, key value B 2000 80K/174M A 2000 20M • View full data set with View() or glimpse() C 1999 212K/1T B 1999 37K key = "year", value = "cases") spread(table2, type, count) • Revert to data frame with as.data.frame() C 2000 213K/1T B 1999 172M B 2000 80K B 2000 174M CONSTRUCT A TIBBLE IN TWO WAYS C 1999 212K Handle Missing Values C 1999 1T tibble(…) Both drop_na(data, ...) fill(data, ..., .direction = c("down", "up")) replace_na(data, C 2000 213K Construct by columns. C 2000 1T make this replace = list(), ...) tibble(x = 1:3, y = c("a", "b", "c")) Drop rows containing Fill in NA’s in … columns with most tibble NA’s in … columns. recent non-NA values. Replace NA’s by column. separate_rows(table3, rate) tribble(…) x x x A tibble: 3 × 2 x1 x2 x1 x2 x1 x2 x1 x2 x1 x2 x1 x2 Construct by rows. x y A 1 A 1 A 1 A 1 A 1 A 1 unite(data, col, ..., sep = "_", remove = TRUE) tribble( ~x, ~y, B NA D 3 B NA B 1 B NA B 2 Collapse cells across several columns to 1, "a", 1 1 a C NA C NA C 1 C NA C 2 2 2 b D 3 D 3 D 3 D 3 D 3 make a single column. 2, "b", 3 3 c E NA E NA E 3 E NA E 2 3, "c") table5 drop_na(x, x2) fill(x, x2) replace_na(x, list(x2 = 2)) country century year country year as_tibble(x, …) Convert data frame to tibble. Afghan 19 99 Afghan 1999 Afghan 20 0 Afghan 2000 enframe(x, name = "name", value = "value") Expand Tables - quickly create tables with combinations of values Brazil 19 99 Brazil 1999 Convert named vector to a tibble Brazil 20 0 Brazil 2000 complete(data, ..., fill = list()) expand(data, ...) China 19 99 China 1999 x Test whether x is a tibble. is_tibble( ) Adds to the data missing combinations of the Create new tibble with all possible combinations China 20 0 China 2000 values of the variables listed in … of the values of the variables listed in … unite(table5, century, year, complete(mtcars, cyl, gear, carb) expand(mtcars, cyl, gear, carb) col = "year", sep = "") RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at tidyverse.org • readr 1.1.0 • tibble 1.2.12 • tidyr 0.6.0 • Updated: 2017-01 Data Transformation with dplyr : : CHEAT SHEET dplyr dplyr functions work with pipes and expect tidy data. In tidy data: Manipulate Cases Manipulate Variables A B C A B C & EXTRACT CASES EXTRACT VARIABLES pipes Row functions return a subset of rows as a new table. Column functions return a set of columns as a new vector or table. Each variable is in Each observation, or x %>% f(y) its own column case, is in its own row becomes f(x, y) filter(.data, …) Extract rows that meet logical pull(.data, var = -1) Extract column values as criteria. filter(iris, Sepal.Length > 7) a vector. Choose by name or index. wwwwww pull(iris, Sepal.Length) Summarise Cases distinct(.data, ..., .keep_all = FALSE) Remove wwww select(.data, …) rows with duplicate values. Extract columns as a table. Also select_if(). These apply summary functions to columns to create a new distinct(iris, Species) select(iris, Sepal.Length, Species) table of summary statistics. Summary functions take vectors as input and return one value (see back). wwwwwwsample_frac(tbl, size = 1, replace = FALSE, wwww weight = NULL, .env = parent.frame()) Randomly summary function Use these helpers with select (), select fraction of rows. e.g. select(iris, starts_with("Sepal")) sample_frac(iris, 0.5, replace = TRUE) summarise(.data, …) contains(match) num_range(prefix, range) :, e.g. mpg:cyl sample_n(tbl, size, replace = FALSE, weight = Compute table of summaries. wwwwww ends_with(match) one_of(…) -, e.g, -Species summarise(mtcars, avg = mean(mpg)) NULL, .env = parent.frame()) Randomly select matches(match) starts_with(match) size rows. sample_n(iris, 10, replace = TRUE) www count(x, ..., wt = NULL, sort = FALSE) slice(.data, …) Select rows by position. MAKE NEW VARIABLES Count number of rows in each group defined slice(iris, 10:15) by the variables in … Also tally(). count(iris, Species) These apply vectorized functions to columns. Vectorized funs take top_n(x, n, wt) Select and order top n entries (by vectors as input and return vectors of the same length as output www group if grouped data). top_n(iris, 5, Sepal.Width) (see back). VARIATIONS wwwwww vectorized function summarise_all() - Apply funs to every column. summarise_at() - Apply funs to specific columns. mutate(.data, …) summarise_if() - Apply funs to all cols of one type. Logical and boolean operators to use with filter() Compute new column(s). mutate(mtcars, gpm = 1/mpg) < <= is.na() %in% | xor() > >= !is.na() ! & wwwwww transmute(.data, …) Group Cases See ?base::logic and ?Comparison for help. Compute new column(s), drop others. transmute(mtcars, gpm = 1/mpg) Use group_by() to create a "grouped" copy of a table. dplyr functions will manipulate each "group" separately and www mutate_all(.tbl, .funs, …) Apply funs to every then combine the results. ARRANGE CASES column. Use with funs(). Also mutate_if(). mutate_all(faithful, funs(log(.), log2(.))) arrange(.data, …) Order rows by values of a mutate_if(iris, is.numeric, funs(log(.))) mtcars %>% column or columns (low to high), use with wwww group_by(cyl) %>% desc() to order from high to low. arrange(mtcars, mpg) mutate_at(.tbl, .cols, .funs, …) Apply funs to wwwwww summarise(avg = mean(mpg)) wwwwwwarrange(mtcars, desc(mpg)) specific columns. Use with funs(), vars() and the helper functions for select(). w w w mutate_at(iris, vars( -Species), funs(log(.))) group_by(.data, ..., add = ungroup(x, …) ADD CASES add_column(.data, ..., .before = NULL, .after = FALSE) Returns ungrouped copy add_row(.data, ..., .before = NULL, .after = NULL) NULL) Add new column(s). Also add_count(), Returns copy of table of table. add_tally(). add_column(mtcars, new = 1:32) grouped by … ungroup(g_iris) Add one or more rows to a table. g_iris <- group_by(iris, Species) add_row(faithful, eruptions = 1, waiting = 1) wwwwww rename(.data, …) Rename columns. wwwwww rename(iris, Length = Sepal.Length) wwwww RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2017-03 Vector Functions Summary Functions Combine Tables TO USE WITH MUTATE () TO USE WITH SUMMARISE () COMBINE VARIABLES COMBINE CASES dplyr mutate() and transmute() apply vectorized summarise() applies summary functions to x y functions to columns to create new columns. columns to create a new table. Summary A B C A B D A B C A B D A B C a t 1 a t 3 a t 1 a t 3 a t 1 Vectorized functions take vectors as input and functions take vectors as input and return single b u 2 b u 2 b u 2 b u 2 b u 2 return vectors of the same length as output. values as output. c v 3 + d w 1 = c v 3 d w 1 x c v 3 A B C vectorized function summary function Use bind_cols() to paste tables beside each C v 3 other as they are. + y d w 4

OFFSETS COUNTS bind_cols(…) Returns tables placed side by dplyr::n() - number of values/rows side as a single table. Use bind_rows() to paste tables below each dplyr::lag() - Offset elements by 1 BE SURE THAT ROWS ALIGN. dplyr:: - Offset elements by -1 dplyr::n_distinct() - # of uniques other as they are. lead() sum(!is.na()) - # of non-NA’s CUMULATIVE AGGREGATES LOCATION Use a "Mutating Join" to join one table to DF A B C bind_rows(…, .id = NULL) dplyr::cumall() - Cumulative all() columns from another, matching values with x a t 1 Returns tables one on top of the other mean() - mean, also mean(!is.na()) x b u 2 dplyr:: - Cumulative any() the rows that they correspond to. Each join x c v 3 as a single table. Set .id to a column cumany() median() - median cummax() - Cumulative max() retains a different combination of values from z c v 3 name to add a column of the original dplyr::cummean() - Cumulative mean() the tables. z d w 4 table names (as pictured) cummin() - Cumulative min() LOGICALS cumprod() - Cumulative prod() mean() - Proportion of TRUE’s A B C D left_join(x, y, by = NULL, A B C intersect(x, y, …) - Cumulative sum() sum() - # of TRUE’s a t 1 3 copy=FALSE, suffix=c(“.x”,“.y”),…) c v 3 Rows that appear in both x and y. cumsum() b u 2 2 c v 3 NA Join matching values from y to x. RANKINGS POSITION/ORDER A B C setdiff(x, y, …) a t 1 A B C D right_join(x, y, by = NULL, copy = Rows that appear in x but not y. dplyr::cume_dist() - Proportion of all values <= dplyr::first() - first value b u 2 a t 1 3 FALSE, suffix=c(“.x”,“.y”),…) dplyr::dense_rank() - rank with ties = min, no dplyr::last() - last value b u 2 2 Join matching values from x to y. A B C union(x, y, …) gaps dplyr::nth() - value in nth location of vector d w NA 1 a t 1 Rows that appear in x or y. dplyr::min_rank() - rank with ties = min b u 2 A B C D inner_join(x, y, by = NULL, copy = (Duplicates removed). union_all() dplyr::ntile() - bins into n bins RANK c v 3 a t 1 3 FALSE, suffix=c(“.x”,“.y”),…) d w 4 retains duplicates. dplyr::percent_rank() - min_rank scaled to [0,1] quantile() - nth quantile b u 2 2 Join data. Retain only rows with dplyr::row_number() - rank with ties = "first" min() - minimum value matches. max() - maximum value MATH A B C D full_join(x, y, by = NULL, Use setequal() to test whether two data sets +, - , *, /, ^, %/%, %% - arithmetic ops SPREAD a t 1 3 copy=FALSE, suffix=c(“.x”,“.y”),… contain the exact same rows (in any order). b u 2 2 ) log(), log2(), log10() - logs IQR() - Inter-Quartile Range c v 3 NA Join data. Retain all values, all rows. <, <=, >, >=, !=, == - logical comparisons mad() - median absolute deviation d w NA 1 dplyr::between() - x >= left & x <= right sd() - standard deviation EXTRACT ROWS dplyr::near() - safe == for floating point var() - variance x y numbers A B.x C B.y D Use by = c("col1", "col2", …) to A B C A B D a t 1 t 3 a t 1 a t 3 MISC specify one or more common b u 2 b u 2 b u 2 u 2 + = c v 3 NA NA columns to match on. c v 3 d w 1 dplyr::case_when() - multi-case if_else() Row Names left_join(x, y, by = "A") dplyr::coalesce() - first non-NA values by element across a set of vectors Tidy data does not use rownames, which store a variable outside of the columns. To work with the A.x B.x C A.y B.y Use a named vector, by = c("col1" = Use a "Filtering Join" to filter one table against dplyr::if_else() - element-wise if() + else() rownames, first move them into a column. a t 1 d w "col2"), to match on columns that the rows of another. dplyr::na_if() - replace specific values with NA b u 2 b u c v 3 a t have different names in each table. pmax() - element-wise max() C A B A B rownames_to_column() left_join(x, y, by = c("C" = "D")) A B C semi_join(x, y, by = NULL, …) pmin() - element-wise min() 1 a t 1 a t Move row names into col. a t 1 Return rows of x that have a match in y. dplyr::recode() - Vectorized switch() 2 b u 2 b u b u 2 a <- rownames_to_column(iris, var A1 B1 C A2 B2 Use to specify the suffix to USEFUL TO SEE WHAT WILL BE JOINED. 3 c v 3 c v suffix dplyr::recode_factor() - Vectorized switch() = "C") a t 1 d w give to unmatched columns that b u 2 b u for factors A B C anti_join(x, y, by = NULL, …) c v 3 a t have the same name in both tables. A B C A B column_to_rownames() left_join(x, y, by = c("C" = "D"), suffix = c v 3 Return rows of x that do not have a 1 a t 1 a t Move col in row names. match in y. USEFUL TO SEE WHAT WILL 2 b u 2 b u c("1", "2")) 3 c v 3 c v column_to_rownames(a, var = "C") NOT BE JOINED.

Also has_rownames(), remove_rownames()

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more with browseVignettes(package = c("dplyr", "tibble")) • dplyr 0.7.0 • tibble 1.2.0 • Updated: 2017-03 Data Visualization with ggplot2 : : CHEAT SHEET

Use a geom function to represent data points, use the geom’s aesthetic properties to represent variables. Basics Geoms Each function returns a layer. GRAPHICAL PRIMITIVES TWO VARIABLES ggplot2 is based on the grammar of graphics, the idea that you can build every graph from the same a <- ggplot(economics, aes(date, unemploy)) continuous x , continuous y continuous bivariate distribution h <- ggplot(diamonds, aes(carat, price)) components: a data set, a coordinate system, b <- ggplot(seals, aes(x = long, y = lat)) e <- ggplot(mpg, aes(cty, hwy)) and geoms—visual marks that represent data points. a + geom_blank() e + geom_label(aes(label = cty), nudge_x = 1, h + geom_bin2d(binwidth = c(0.25, 500)) (Useful for expanding limits) nudge_y = 1, check_overlap = TRUE) x, y, label, x, y, alpha, color, fill, linetype, size, weight F M A alpha, angle, color, family, fontface, hjust, aes(yend = lat + 1, b + geom_curve( lineheight, size, vjust h + geom_density2d() xend=long+1,curvature=z)) - x, xend, y, yend, x, y, alpha, colour, group, linetype, size + = alpha, angle, color, curvature, linetype, size e + geom_jitter(height = 2, width = 2) x, y, alpha, color, fill, shape, size data geom coordinate plot a + geom_path(lineend="butt", linejoin="round", h + geom_hex() x = F · y = A system x, y, alpha, colour, fill, size linemitre=1) e + geom_point(), x, y, alpha, color, fill, shape, x, y, alpha, color, group, linetype, size size, stroke

To display values, map variables in the data to visual a + geom_polygon(aes(group = group)) e + geom_quantile(), x, y, alpha, color, group, x, y, alpha, color, fill, group, linetype, size properties of the geom (aesthetics) like size, color, and x linetype, size, weight continuous function and y locations. i <- ggplot(economics, aes(date, unemploy)) b + geom_rect(aes(xmin = long, ymin=lat, xmax= F M A long + 1, ymax = lat + 1)) - xmax, xmin, ymax, e + geom_rug(sides = "bl"), x, y, alpha, color, i + geom_area() ymin, alpha, color, fill, linetype, size linetype, size x, y, alpha, color, fill, linetype, size

+ = a + geom_ribbon(aes(ymin=unemploy - 900, e + geom_smooth(method = lm), x, y, alpha, i + geom_line() ymax=unemploy + 900)) - x, ymax, ymin, color, fill, group, linetype, size, weight x, y, alpha, color, group, linetype, size data geom coordinate plot x = F · y = A system alpha, color, fill, group, linetype, size color = F e + geom_text(aes(label = cty), nudge_x = 1, i + geom_step(direction = "hv") size = A nudge_y = 1, check_overlap = TRUE), x, y, label, x, y, alpha, color, group, linetype, size alpha, angle, color, family, fontface, hjust, LINE SEGMENTS lineheight, size, vjust common aesthetics: x, y, alpha, color, linetype, size

b + geom_abline(aes(intercept=0, slope=1)) visualizing error Complete the template below to build a graph. b + geom_hline(aes(yintercept = lat)) df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2) required b + geom_vline(aes(xintercept = long)) discrete x , continuous y j <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se)) ggplot (data = ) + f <- ggplot(mpg, aes(class, hwy)) j + geom_crossbar(fatten = 2) (mapping = aes( ), b + geom_segment(aes(yend=lat+1, xend=long+1)) x, y, ymax, ymin, alpha, color, fill, group, linetype, b + geom_spoke(aes(angle = 1:1155, radius = 1)) f + geom_col(), x, y, alpha, color, fill, group, stat = , position = ) + Not linetype, size size required, + sensible j + geom_errorbar(), x, ymax, ymin, alpha, color, f + geom_boxplot(), x, y, lower, middle, upper, group, linetype, size, width (also + defaults ymax, ymin, alpha, color, fill, group, linetype, supplied ONE VARIABLE continuous geom_errorbarh()) + c <- ggplot(mpg, aes(hwy)); c2 <- ggplot(mpg) shape, size, weight j + geom_linerange() f + geom_dotplot(binaxis = "y", stackdir = x, ymin, ymax, alpha, color, group, linetype, size c + geom_area(stat = "bin") "center"), x, y, alpha, color, fill, group x, y, alpha, color, fill, linetype, size j + geom_pointrange() ggplot(data = mpg, aes(x = cty, y = hwy)) Begins a plot f + geom_violin(scale = "area"), x, y, alpha, color, x, y, ymin, ymax, alpha, color, fill, group, linetype, that you finish by adding layers to. Add one geom c + geom_density(kernel = "gaussian") fill, group, linetype, size, weight shape, size function per layer. x, y, alpha, color, fill, group, linetype, size, weight aesthetic mappings data geom c + geom_dotplot() maps qplot(x = cty, y = hwy, data = mpg, geom = “point") x, y, alpha, color, fill data <- data.frame(murder = USArrests$Murder, Creates a complete plot with given data, geom, and discrete x , discrete y state = tolower(rownames(USArrests))) c + geom_freqpoly() x, y, alpha, color, group, mappings. Supplies many useful defaults. g <- ggplot(diamonds, aes(cut, color)) map <- map_data("state") linetype, size k <- ggplot(data, aes(fill = murder)) Returns the last plot last_plot() g + geom_count(), x, y, alpha, color, fill, shape, c + geom_histogram(binwidth = 5) x, y, alpha, k + geom_map(aes(map_id = state), map = map) ggsave("plot.png", width = 5, height = 5) Saves last plot color, fill, linetype, size, weight size, stroke + expand_limits(x = map$long, y = map$lat), as 5’ x 5’ file named "plot.png" in working directory. map_id, alpha, color, fill, linetype, size Matches file type to file extension. c2 + geom_qq(aes(sample = hwy)) x, y, alpha, color, fill, linetype, size, weight THREE VARIABLES seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2))l <- ggplot(seals, aes(long, lat)) discrete l + geom_contour(aes(z = z)) l + geom_raster(aes(fill = z), hjust=0.5, vjust=0.5, interpolate=FALSE) d <- ggplot(mpg, aes(fl)) x, y, z, alpha, colour, group, linetype, size, weight x, y, alpha, fill d + geom_bar() x, alpha, color, fill, linetype, size, weight l + geom_tile(aes(fill = z)), x, y, alpha, color, fill, linetype, size, width

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at http://ggplot2.tidyverse.org • ggplot2 2.1.0 • Updated: 2016-11 Stats An alternative way to build a layer Scales Coordinate Systems Faceting

A stat builds new variables to plot (e.g., count, prop). Scales map data values to the visual values of an r <- d + geom_bar() Facets divide a plot into fl cty cyl aesthetic. To change a mapping, add a new scale. r + coord_cartesian(xlim = c(0, 5)) subplots based on the xlim, ylim values of one or more x ..count.. (n <- d + geom_bar(aes(fill = fl))) The default cartesian coordinate system discrete variables. aesthetic prepackaged scale-specific r + coord_fixed(ratio = 1/2) + = scale_ to adjust scale to use arguments ratio, xlim, ylim data stat geom coordinate plot Cartesian coordinates with fixed aspect ratio t <- ggplot(mpg, aes(cty, hwy)) + geom_point() x = x · system n + scale_fill_manual( between x and y units y = ..count.. values = c("skyblue", "royalblue", "blue", “navy"), t + facet_grid(. ~ fl) r + coord_flip() Visualize a stat by changing the default stat of a geom limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", “r"), xlim, ylim facet into columns based on fl = "fuel", labels = c("D", "E", "P", "R") function, geom_bar(stat="count") or by using a stat name ) Flipped Cartesian coordinates t + facet_grid(year ~ .) r + coord_polar(theta = "x", direction=1 ) facet into rows based on year function, stat_count(geom="bar"), which calls a default range of title to use in labels to use breaks to use in theta, start, direction geom to make a layer (equivalent to a geom function). values to include legend/axis in legend/axis legend/axis in mapping Polar coordinates t + facet_grid(year ~ fl) Use ..name.. syntax to map stat variables to aesthetics. r + coord_trans(ytrans = “sqrt") facet into both rows and columns xtrans, ytrans, limx, limy GENERAL PURPOSE SCALES Transformed cartesian coordinates. Set xtrans and t + facet_wrap(~ fl) geom to use stat function geommappings ytrans to the name of a window function. wrap facets into a rectangular layout Use with most aesthetics i + stat_density2d(aes(fill = ..level..), Set scales to let axis limits vary across facets scale_*_continuous() - map cont’ values to visual ones 60 geom = "polygon") π + coord_quickmap()

variable created by stat scale_*_discrete() - map discrete values to visual ones lat π + coord_map(projection = "ortho", t + facet_grid(drv ~ fl, scales = "free") scale_*_identity() - use data values as visual ones orientation=c(41, -74, 0))projection, orienztation, x and y axis limits adjust to individual facets xlim, ylim - x axis limits adjust scale_*_manual(values = c()) - map discrete values to long "free_x" c + stat_bin(binwidth = 1, origin = 10) Map projections from the mapproj package "free_y" - y axis limits adjust x, y | ..count.., ..ncount.., ..density.., ..ndensity.. manually chosen visual ones (mercator (default), azequalarea, lagrange, etc.) scale_*_date(date_labels = "%m/%d"), date_breaks = "2 Set labeller to adjust facet labels c + stat_count(width = 1) x, y, | ..count.., ..prop.. weeks") - treat data values as dates. c + stat_density(adjust = 1, kernel = “gaussian") scale_*_datetime() - treat data x values as date times. t + facet_grid(. ~ fl, labeller = label_both) x, y, | ..count.., ..density.., ..scaled.. Use same arguments as scale_x_date(). See ?strptime for fl: c fl: d fl: e fl: p fl: r label formats. Position Adjustments t + facet_grid(fl ~ ., labeller = label_bquote(alpha ^ .(fl))) e + stat_bin_2d(bins = 30, drop = T) c d e p r x, y, fill | ..count.., ..density.. Position adjustments determine how to arrange geoms ↵ ↵ ↵ ↵ ↵ X & Y LOCATION SCALES e + stat_bin_hex(bins=30) x, y, fill | ..count.., ..density.. that would otherwise occupy the same space. t + facet_grid(. ~ fl, labeller = label_parsed) Use with x or y aesthetics (x shown here) e + stat_density_2d(contour = TRUE, n = 100) c d e p r x, y, color, size | ..level.. scale_x_log10() - Plot x on log10 scale s <- ggplot(mpg, aes(fl, fill = drv)) e + stat_ellipse(level = 0.95, segments = 51, type = "t") scale_x_reverse() - Reverse direction of x axis s + geom_bar(position = "dodge") scale_x_sqrt() - Plot x on square root scale Arrange elements side by side Labels l + stat_contour(aes(z = z)) x, y, z, order | ..level.. s + geom_bar(position = "fill") Stack elements on top of one another, = "New x axis label", = "New y axis label", l + stat_summary_hex(aes(z = z), bins = 30, fun = max) COLOR AND FILL SCALES (DISCRETE) normalize height t + labs( x y title ="Add a title above the plot", x, y, z, fill | ..value.. Use scale functions n <- d + geom_bar(aes(fill = fl)) e + geom_point(position = "jitter") subtitle = "Add a subtitle below title", aes(z = z), bins = 30, fun = mean Add random noise to X and Y position of each to update legend l + stat_summary_2d( ) palette = "Blues" caption = "Add a caption below plot", x, y, z, fill | ..value.. n + scale_fill_brewer( ) element to avoid overplotting labels For palette choices: A = "New legend title") RColorBrewer::display.brewer.all() e + geom_label(position = "nudge") B Nudge labels away from points t + annotate(geom = "text", x = 8, y = 9, label = "A") f + stat_boxplot(coef = 1.5) x, y | ..lower.., ..middle.., ..upper.., ..width.. , ..ymin.., ..ymax.. n + scale_fill_grey(start = 0.2, end = 0.8, na.value = "red") geom to place manual values for geom’s aesthetics f + stat_ydensity(kernel = "gaussian", scale = “area") x, y | s + geom_bar(position = "stack") ..density.., ..scaled.., ..count.., ..n.., ..violinwidth.., ..width.. Stack elements on top of one another COLOR AND FILL SCALES (CONTINUOUS) e + stat_ecdf(n = 40) x, y | ..x.., ..y.. o <- c + geom_dotplot(aes(fill = ..x..)) Each position adjustment can be recast as a function with e + stat_quantile(quantiles = c(0.1, 0.9), formula = y ~ manual width and height arguments Legends log(x), method = "rq") x, y | ..quantile.. o + scale_fill_distiller(palette = "Blues") s + geom_bar(position = position_dodge(width = 1)) n + theme(legend.position = "bottom") Place legend at "bottom", "top", "left", or "right" e + stat_smooth(method = "lm", formula = y ~ x, se=T, o + scale_fill_gradient(low="red", high="yellow") level=0.95) x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax.. n + guides(fill = "none") Set legend type for each aesthetic: colorbar, legend, or ggplot() + stat_function(aes(x = -3:3), n = 99, fun = o + scale_fill_gradient2(low="red", high=“blue", none (no legend) dnorm, args = list(sd=0.5)) x | ..x.., ..y.. mid = "white", midpoint = 25) Themes n + scale_fill_discrete(name = "Title", labels = c("A", "B", "C", "D", "E")) e + stat_identity(na.rm = TRUE) r + theme_bw() r + theme_classic() Set legend title and labels with a scale function. o + scale_fill_gradientn(colours=topo.colors(6)) White background ggplot() + stat_qq(aes(sample=1:100), dist = qt, Also: rainbow(), heat.colors(), terrain.colors(), with grid lines r + theme_light() dparam=list(df=5)) sample, x, y | ..sample.., ..theoretical.. cm.colors(), RColorBrewer::brewer.pal() r + theme_gray() r + theme_linedraw() e + stat_sum() x, y, size | ..n.., ..prop.. Grey background (default theme) r + theme_minimal() Zooming e + stat_summary(fun.data = "mean_cl_boot") SHAPE AND SIZE SCALES Minimal themes h + stat_summary_bin(fun.y = "mean", geom = "bar") p <- e + geom_point(aes(shape = fl, size = cyl)) r + theme_dark() r + theme_void() Without clipping (preferred) dark for contrast Empty theme e + stat_unique() p + scale_shape() + scale_size() t + coord_cartesian( p + scale_shape_manual(values = c(3:7)) xlim = c(0, 100), ylim = c(10, 20)) With clipping (removes unseen data points) t + xlim(0, 100) + ylim(10, 20) p + scale_radius(range = c(1,6)) max_size = 6 t + scale_x_continuous(limits = c(0, 100)) + p + scale_size_area( ) scale_y_continuous(limits = c(0, 100))

Apply Functions Work with Lists Map functions apply a function iteratively to each element of a list TRANSFORM LISTS or vector. FILTER LISTS SUMMARISE LISTS map(.x, .f, …) Apply a a b pluck(.x, ..., .default=NULL) a FALSE every(.x, .p, …) Do all a a modify(.x, .f, ...) Apply fun( ,…) b Select an element by name b elements pass a test? b b function to each element. Also ( ,…) function to each map( , fun, …) fun c or index, pluck(x,"b") ,or its c every(x, is.character) c c map, map_chr, map_dbl, fun( ,…) element of a list or vector. map(x, is.logical) d attribute with attr_getter. d d map_dfc, map_dfr, map_int, pluck(x,"b",attr_getter("n")) a TRUE some(.x, .p, …) Do some map_lgl. modify(x, ~.+ 2) b elements pass a test? map2(.x, ,y, .f, …) Apply a a keep(.x, .p, …) Select c some(x, is.character) a a modify_at(.x, .at, .f, ...) Apply fun( , ,…) elements that pass a function to elements by name map2( , ,fun,…) fun( , ,…) a function to pairs of b c b b fun( , ,…) elements from two lists, c logical test. keep(x, is.na) a TRUE has_element(.x, .y) Does a c c or index. Also map_at. vectors. map2(x, y, sum) b list contain an element? d d modify_at(x, "b", ~.+ 2) a b discard(.x, .p, …) Select c has_element(x, "foo") b elements that do not pass a a a modify_if(.x, .p, .f, ...) Apply pmap(.l, .f, …) Apply a c logical test. discard(x, is.na) detect(.x, .f, ..., .right=FALSE, b b function to elements that fun( , , ,…) function to groups of a c pmap( ,fun,…) fun( , , ,…) b .p) Find first element to pass. c c pass a test. Also map_if. elements from list of lists, detect(x, is.character) d d modify_if(x, is.numeric,~.+2) fun( , , ,…) a NULL b compact(.x, .p = identity) c vectors. pmap(list(x, y, z), b Drop empty elements. sum, na.rm = TRUE) NULL c compact(x) a 3 detect_index(.x, .f, ..., .right modify_depth(.x,.depth,.f,...) b = FALSE, .p) Find index of Apply function to each first element to pass. element at a given level of a (.f, .x = a a head_while(.x, .p, …) c fun fun( ,…) invoke_map detect_index(x, is.character) list. modify_depth(x, 1, ~.+ 2) list(NULL), …, .env=NULL) b b Return head elements invoke_map( fun , ,…) fun( ,…) c until one does not pass. fun ( ,…) Run each function in a list. x y z 2 fun d Also tail_while. a vec_depth(x) Return depth Also invoke. l <- list(var, head_while(x, is.character) b (number of levels of WORK WITH LISTS sd); invoke_map(l, x = 1:9) c indexes). vec_depth(x) array_tree(array, margin = lmap(.x, .f, ...) Apply function to each list-element of a list or vector. NULL) Turn array into list. Also . (.x, .f, ...) Apply .f to each element of a list or vector and its index. RESHAPE LISTS JOIN (TO) LISTS array_branch imap array_tree(x, margin = 3) flatten(.x) Remove a level append(x, values, after = OUTPUT a + b of indexes from a list. Also length(x)) Add to end of list. + cross2(.x, .y, .filter = NULL) map(), map2(), pmap(), function returns c flatten_chr, flatten_dbl, append(x, list(d = 1)) All combinations of .x flatten_dfc, flatten_dfr, and .y. Also cross, cross3, imap and invoke_map map list each return a list. Use a flatten_int, flatten_lgl. + prepend(x, values, before = cross_df. cross2(1:3, 4:6) suffixed version to map_chr character vector flatten(x) 1) Add to start of list. return the results as a map_dbl double (numeric) vector prepend(x, list(d = 1)) a p set_names(x, nm = x) Set specific type of flat x y x y transpose(.l, .names = b q the names of a vector/list vector, e.g. map2_chr, map_dfc data frame (column bind) a a NULL) Transposes the index + splice(…) Combine objects c r directly or with a function. pmap_lgl, etc. map_dfr data frame (row bind) b b order in a multi-level list. into a list, storing S3 objects set_names(x, c("p", "q", "r")) c c transpose(x) + as sub-lists. splice(x, y, "foo") set_names(x, tolower) Use walk, walk2, and map_int integer vector pwalk to trigger side map_lgl logical vector effects. Each return its triggers side effects, returns input invisibly. walk the input invisibly Reduce Lists Modify function behavior a b c d a b reduce(.x, .f, ..., .init) compose() Compose negate() Negate a quietly() Modify SHORTCUTS - within a purrr function: func + func( , ) c Apply function recursively multiple functions. predicate function (a function to return "name" becomes ~ .x .y becomes func( , ) to each element of a list or pipe friendly !) list of results, d vector. Also reduce_right, lift() Change the type output, messages, function(x) x[["name"]], function(.x, .y) .x .y, e.g. func( , ) e.g. map(l, "a") extracts a map2(l, p, ~ .x +.y ) becomes reduce2, reduce2_right. of input a function partial() Create a warnings. from each element of l map2(l, p, function(l, p) l + p ) reduce(x, sum) takes. Also lift_dl, version of a function lift_dv, lift_ld, lift_lv, that has some args possibly() Modify lift_vd, lift_vl. preset to values. function to return ~ .x becomes function(x) x, ~ ..1 ..2 etc becomes func + a b c d func( , ) accumulate(.x, .f, ..., .init) e.g. map(l, ~ 2 +.x) becomes function(..1, ..2, etc) ..1 ..2 etc, c Reduce, but also return default value func( , ) rerun() Rerun safely() Modify func whenever an error map(l, function(x) 2 + x ) e.g. pmap(list(a, b, c), ~ ..3 + ..1 - ..2) d intermediate results. Also becomes pmap(list(a, b, c), func( , ) expression n times. to return list of occurs (instead of accumulate_right. results and errors. error). function(a, b, c) c + a - b) accumulate(x, sum)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at purrr.tidyverse.org • purrr 0.2.3 • Updated: 2017-09 Nested data frames use a list column, a list that is stored as a column vector of a data frame. A typical workflow for list columns: Nested Data "cell" contents List Column Workflow A nested data frame stores Sepal.L Sepal.W Petal.L Petal.W individual tables within the 5.1 3.5 1.4 0.2 Make a list Work with Simplify cells of a larger, organizing 4.9 3.0 1.4 0.2 column list columns the list 4.7 3.2 1.3 0.2 1 S.L S.W P.L P.W 2 3 table. Call: column 4.6 3.1 1.5 0.2 Species S.L S.W P.L P.W 5.1 3.5 1.4 0.2 lm(S.L ~ ., df) 4.9 3.0 1.4 0.2 5.0 3.6 1.4 0.2 setosa 5.1 3.5 1.4 0.2 Coefs: setosa 4.9 3.0 1.4 0.2 4.7 3.2 1.3 0.2 (Int) S.W P.L P.W n_iris$data[[1]] setosa 4.7 3.2 1.3 0.2 4.6 3.1 1.5 0.2 2.3 0.6 0.2 0.2 setosa 4.6 3.1 1.5 0.2 Species data S.L S.W P.L P.W nested data frame versi 7.0 3.2 4.7 1.4 Species data model Call: Species beta Sepal.L Sepal.W Petal.L Petal.W setos 7.0 3.2 4.7 1.4 versi 6.4 3.2 4.5 1.5 setosa lm(S.L ~ ., df) setos 2.35 versi 6.4 3.2 4.5 1.5 Species data 7.0 3.2 4.7 1.4 versi 6.9 3.1 4.9 1.5 versi Coefs: versi 1.89 virgini 6.9 3.1 4.9 1.5 setosa 6.4 3.2 4.5 1.5 versi 5.5 2.3 4.0 1.3 virgini (Int) S.W P.L P.W virgini 0.69 5.5 2.3 4.0 1.3 1.8 0.3 0.9 -0.6 versicolor 6.9 3.1 4.9 1.5 virgini 6.3 3.3 6.0 2.5 virgini 5.8 2.7 5.1 1.9 S.L S.W P.L P.W virginica 5.5 2.3 4.0 1.3 Call: virgini 7.1 3.0 5.9 2.1 6.5 2.8 4.6 1.5 6.3 3.3 6.0 2.5 lm(S.L ~ ., df) n_iris virgini 6.3 2.9 5.6 1.8 5.8 2.7 5.1 1.9 7.1 3.0 5.9 2.1 Coefs: n_iris$data[[2]] (Int) S.W P.L P.W 6.3 2.9 5.6 1.8 0.6 0.3 0.9 -0.1 Sepal.L Sepal.W Petal.L Petal.W n_iris <- iris %>% mod_fun <- function(df) b_fun <- function(mod) Use a nested data frame to: 6.3 3.3 6.0 2.5 group_by(Species) %>% lm(Sepal.Length ~ ., data = df) coeﬀicients(mod)[[1]] 5.8 2.7 5.1 1.9 nest() • preserve relationships 7.1 3.0 5.9 2.1 m_iris <- n_iris %>% m_iris %>% transmute(Species, between observations and 6.3 2.9 5.6 1.8 mutate(model = map(data, mod_fun)) beta = map_dbl(model, b_fun)) subsets of data 6.5 3.0 5.8 2.2 n_iris$data[[3]] • manipulate many sub-tables 1. MAKE A LIST COLUMN - You can create list columns with functions in the tibble and dplyr packages, as well as tidyr’s nest() at once with the purrr functions map(), map2(), or pmap(). tibble::tribble(…) tibble::tibble(…) dplyr::mutate(.data, …) Also transmute() Makes list column when needed Saves list input as list columns Returns list col when result returns list.

Use a two step process to create a nested data frame: tribble( ~max, ~seq, max seq tibble(max = c(3, 4, 5), seq = list(1:3, 1:4, 1:5)) mtcars %>% mutate(seq = map(cyl, seq)) 1. Group the data frame into groups with dplyr::group_by() 3, 1:3, 3 2. Use nest() to create a nested data frame 4, 1:4, 4 5 tibble:: x, name="name", value="value" dplyr:: .data, … with one row per group S.L S.W P.L P.W 5, 1:5) enframe( ) summarise( ) 5.1 3.5 1.4 0.2 Converts multi-level list to tibble with list cols Returns list col when result is wrapped with list() Species S.L S.W P.L P.W Species S.L S.W P.L P.W 4.9 3.0 1.4 0.2 enframe(list('3'=1:3, '4'=1:4, '5'=1:5), 'max', 'seq') mtcars %>% group_by(cyl) %>% setosa 5.1 3.5 1.4 0.2 setosa 5.1 3.5 1.4 0.2 4.7 3.2 1.3 0.2 setosa 4.9 3.0 1.4 0.2 setosa 4.9 3.0 1.4 0.2 4.6 3.1 1.5 0.2 summarise(q = list(quantile(mpg))) setosa 4.7 3.2 1.3 0.2 setosa 4.7 3.2 1.3 0.2 5.0 3.6 1.4 0.2 setosa 4.6 3.1 1.5 0.2 setosa 4.6 3.1 1.5 0.2 setosa 5.0 3.6 1.4 0.2 setosa 5.0 3.6 1.4 0.2 S.L S.W P.L P.W 2. WORK WITH LIST COLUMNS - Use the purrr functions map(), map2(), and pmap() to apply a function that returns a result element-wise versi 7.0 3.2 4.7 1.4 versi 7.0 3.2 4.7 1.4 Species data 7.0 3.2 4.7 1.4 to the cells of a list column. walk(), walk2(), and pwalk() work the same way, but return a side eﬀect. versi 6.4 3.2 4.5 1.5 versi 6.4 3.2 4.5 1.5 setos 6.4 3.2 4.5 1.5 versi 6.9 3.1 4.9 1.5 versi 6.9 3.1 4.9 1.5 versi 6.9 3.1 4.9 1.5 purrr::map(.x, .f, ...) data versi 5.5 2.3 4.0 1.3 versi 5.5 2.3 4.0 1.3 virgini 5.5 2.3 4.0 1.3 data fun( , …) result versi 6.5 2.8 4.6 1.5 versi 6.5 2.8 4.6 1.5 6.5 2.8 4.6 1.5 Apply .f element-wise to .x as .f(.x) map( , fun, …) result 1 virgini 6.3 3.3 6.0 2.5 virgini 6.3 3.3 6.0 2.5 fun( , …) result 2 virgini 5.8 2.7 5.1 1.9 virgini 5.8 2.7 5.1 1.9 S.L S.W P.L P.W n_iris %>% mutate(n = map(data, dim)) fun( , …) result 3 virgini 7.1 3.0 5.9 2.1 virgini 7.1 3.0 5.9 2.1 6.3 3.3 6.0 2.5 virgini 6.3 2.9 5.6 1.8 virgini 6.3 2.9 5.6 1.8 5.8 2.7 5.1 1.9 purrr:: .x, .y, .f, ... map2( ) data model data model result virgini 6.5 3.0 5.8 2.2 virgini 6.5 3.0 5.8 2.2 7.1 3.0 5.9 2.1 Apply .f element-wise to .x and .y as .f(.x, .y) fun( , ,…) result 1 6.3 2.9 5.6 1.8 map2( , , fun, …) fun( , ,…) result 2 6.5 3.0 5.8 2.2 m_iris %>% mutate(n = map2(data, model, list)) n_iris <- iris %>% group_by(Species) %>% nest() fun( , ,…) result 3 .l, .f, ... tidyr::nest(data, ..., .key = data) purrr::pmap( ) Apply .f element-wise to vectors saved in .l data model funs fun( data , model , funs ,…) result For grouped data, moves groups into cells as data frames. pmap(list( , , coef ), fun, …) coef result 1 m_iris %>% AIC fun( , , AIC ,…) result 2 mutate(n = pmap(list(data, model, data), list)) BIC fun( , , BIC ,…) result 3 Unnest a nested data frame Species data Species S.L S.W P.L P.W setos setosa 5.1 3.5 1.4 0.2 with unnest(): versi setosa 4.9 3.0 1.4 0.2 3. SIMPLIFY THE LIST COLUMN (into a regular column) virgini setosa 4.7 3.2 1.3 0.2 n_iris %>% unnest() setosa 4.6 3.1 1.5 0.2 Use the purrr functions map_lgl(), purrr::map_lgl(.x, .f, ...) purrr::map_dbl(.x, .f, ...) versi 7.0 3.2 4.7 1.4 tidyr::unnest(data, ..., .drop = NA, .id=NULL, .sep=NULL) versi 6.4 3.2 4.5 1.5 map_int(), map_dbl(), map_chr(), Apply .f element-wise to .x, return a logical vector Apply .f element-wise to .x, return a double vector versi 6.9 3.1 4.9 1.5 as well as tidyr’s unnest() to reduce n_iris %>% transmute(n = map_lgl(data, is.matrix)) n_iris %>% transmute(n = map_dbl(data, nrow)) Unnests a nested data frame. versi 5.5 2.3 4.0 1.3 virgini 6.3 3.3 6.0 2.5 a list column into a regular column. purrr::map_chr(.x, .f, ...) virgini 5.8 2.7 5.1 1.9 purrr::map_int(.x, .f, ...) virgini 7.1 3.0 5.9 2.1 Apply .f element-wise to .x, return an integer vector Apply .f element-wise to .x, return a character vector virgini 6.3 2.9 5.6 1.8 n_iris %>% transmute(n = map_int(data, nrow)) n_iris %>% transmute(n = map_chr(data, nrow))

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at purrr.tidyverse.org • purrr 0.2.3 • Updated: 2017-09 String manipulation with stringr : : CHEAT SHEET

The stringr package provides a set of internally consistent tools for working with character strings, i.e. sequences of characters surrounded by quotation marks. Detect Matches Subset Strings Manage Lengths

TRUE str_detect(string, pattern) Detect the str_sub(string, start = 1L, end = -1L) Extract 4 str_length(string) The width of strings (i.e. TRUE presence of a pattern match in a string. substrings from a character vector. 6 number of code points, which generally equals FALSE 2 TRUE str_detect(fruit, "a") str_sub(fruit, 1, 3); str_sub(fruit, -2) 3 the number of characters). str_length(fruit)

1 str_which(string, pattern) Find the indexes of str_subset(string, pattern) Return only the str_pad(string, width, side = c("left", "right", 2 strings that contain a pattern match. strings that contain a pattern match. "both"), pad = " ") Pad strings to constant 4 str_which(fruit, "a") str_subset(fruit, "b") width. str_pad(fruit, 17)

0 str_count(string, pattern) Count the number str_extract(string, pattern) Return the first str_trunc(string, width, side = c("right", "left", 3 of matches in a string. NA pattern match found in each string, as a vector. "center"), ellipsis = "...") Truncate the width of 1 2 str_count(fruit, "a") Also str_extract_all to return every pattern strings, replacing content with ellipsis. match. str_extract(fruit, "[aeiou]") str_trunc(fruit, 3) start end 2 4 str_locate(string, pattern) Locate the 4 7 positions of pattern matches in a string. Also str_match(string, pattern) Return the first str_trim(string, side = c("both", "left", "right")) NA NA str_locate_all. str_locate(fruit, "a") pattern match found in each string, as a Trim whitespace from the start and/or end of a 3 4 NA NA matrix with a column for each ( ) group in string. str_trim(fruit) pattern. Also str_match_all. str_match(sentences, "(a|the) ([^ ]+)")

Mutate Strings Join and Split Order Strings

str_sub() <- value. Replace substrings by str_c(..., sep = "", collapse = NULL) Join 4 str_order(x, decreasing = FALSE, na_last = identifying the substrings with str_sub() and multiple strings into a single string. 1 TRUE, locale = "en", numeric = FALSE, ...)1 Return 3 the vector of indexes that sorts a character assigning into the results. str_c(letters, LETTERS) 2 str_sub(fruit, 1, 3) <- "str" vector. x[str_order(x)] str_c(..., sep = "", collapse = NULL) Collapse str_sort(x, decreasing = FALSE, na_last = TRUE, str_replace(string, pattern, replacement) a vector of strings into a single string. 1 Replace the first matched pattern in each str_c(letters, collapse = "") locale = "en", numeric = FALSE, ...) Sort a str_replace(fruit, "a", "-") character vector. string. str_sort(x) str_dup(string, times) Repeat strings times str_replace_all(string, pattern, times. str_dup(fruit, times = 2) replacement) Replace all matched patterns in each string. str_replace_all(fruit, "a", "-") str_split_fixed(string, pattern, n) Split a Helpers vector of strings into a matrix of substrings str_conv(string, encoding) Override the A STRING (string, locale = "en")1 Convert (splitting at occurrences of a pattern match). str_to_lower encoding of a string. str_conv(fruit,"ISO-8859-1") a string strings to lower case. Also str_split to return a list of substrings. str_to_lower(sentences) str_split_fixed(fruit, " ", n=2) str_view(string, pattern, match = NA) View a string 1 HTML rendering of first regex match in each str_to_upper(string, locale = "en") Convert {xx} {yy} str_glue(…, .sep = "", .envir = parent.frame()) string. str_view(fruit, "[aeiou]") A STRING strings to upper case. Create a string from strings and {expressions} str_to_upper(sentences) to evaluate. str_glue("Pi is {pi}") str_view_all(string, pattern, match = NA) View a string 1 HTML rendering of all regex matches. str_to_title(string, locale = "en") Convert str_glue_data(.x, ..., .sep = "", .envir = str_view_all(fruit, "[aeiou]") A String strings to title case. str_to_title(sentences) parent.frame(), .na = "NA") Use a data frame, list, or environment to create a string from str_wrap(string, width = 80, indent = 0, exdent strings and {expressions} to evaluate. = 0) Wrap strings into nicely formatted str_glue_data(mtcars, "{rownames(mtcars)} str_wrap(sentences, 20) has {hp} hp") paragraphs.

1 See bit.ly/ISO639-1 for a complete list of locales.

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at stringr.tidyverse.org • Diagrams from @LVaudor • stringr 1.2.0 • Updated: 2017-10 Regular expressions, or regexps, are a concise language for [:space:]

Need to Know Regular Expressions - describing patterns in strings. new line Pattern arguments in stringr are interpreted as MATCH CHARACTERS see <- function(rx) str_view_all("abc ABC 123\t.!?\$){}\n", rx) regular expressions after any special characters [:blank:] . have been parsed. string (type regexp matches example this) (to mean this) (which matches this) space In R, you write regular expressions as strings, a (etc.) a (etc.) see("a") abc ABC 123 .!?\(){} tab sequences of characters surrounded by quotes \\. \. . see("\\.") abc ABC 123 .!?\(){} ("") or single quotes(''). \\! \! ! see("\\!") abc ABC 123 .!?\(){} [:graph:] Some characters cannot be represented directly \\? \? ? see("\\?") abc ABC 123 .!?\(){} in an R string . These must be represented as \\\\ \\ \ see("\\\\") abc ABC 123 .!?\(){} [:punct:] special characters, sequences of characters that \\( \( ( see("\\(") abc ABC 123 .!?\(){} have a specific meaning., e.g. \$ \) ) see("\\)") abc ABC 123 .!?\(){} . , : ; ? ! \ | / ` = * + - ^ Special Character Represents \\{ \{ { see("\\{") abc ABC 123 .!?\(){} _ ~ " ' [ ] { } ( ) < > @ # $ \\ \ \\} \} } see( "\\}") abc ABC 123 .!?\(){} \" " \\n \n new line (return) see("\\n") abc ABC 123 .!?\(){} [:alnum:] \n new line \\t \t tab see("\\t") abc ABC 123 .!?\(){} Run ?"'" to see a complete list \\s \s any whitespace (\S for non-whitespaces) see("\\s") abc ABC 123 .!?\(){} [:digit:] \\d \d any digit (\D for non-digits) see("\\d") abc ABC 123 .!?\(){} 0 1 2 3 4 5 6 7 8 9 Because of this, whenever a \ appears in a regular \\w \w any word character (\W for non-word chars) see("\\w") abc ABC 123 .!?\(){} expression, you must write it as \\ in the string \\b \b word boundaries see("\\b") abc ABC 123 .!?\(){} 1 that represents the regular expression. [:digit:] digits see("[:digit:]") abc ABC 123 .!?\(){} [:alpha:] 1 Use writeLines() to see how R views your string [:alpha:] letters see("[:alpha:]") abc ABC 123 .!?\(){} [:lower:] [:upper:] after all special characters have been parsed. [:lower:] 1 lowercase letters see("[:lower:]") abc ABC 123 .!?\(){} [:upper:] 1 uppercase letters see("[:upper:]") abc ABC 123 .!?\(){} a b c d e f A B C D E F writeLines("\\.") [:alnum:] 1 letters and numbers see("[:alnum:]") abc ABC 123 .!?\(){} # \. g h i j k l G H I J K L [:punct:] 1 punctuation see("[:punct:]") abc ABC 123 .!?\(){} 1 m n o p q r writeLines("\\ is a backslash") [:graph:] letters, numbers, and punctuation see("[:graph:]") abc ABC 123 .!?\(){} M N O P Q R # \ is a backslash [:space:] 1 space characters (i.e. \s) see("[:space:]") abc ABC 123 .!?\(){} s t u v w x S T U V W X 1 [:blank:] space and tab (but not new line) see("[:blank:]") abc ABC 123 .!?\(){} z Z . every character except a new line see(".") abc ABC 123 .!?\(){} INTERPRETATION 1 Many base R functions require classes to be wrapped in a second set of [ ], e.g. [[:digit:]] Patterns in stringr are interpreted as regexs To change this default, wrap the pattern in one of: ALTERNATES alt <- function(rx) str_view_all("abcde", rx) QUANTIFIERS quant <- function(rx) str_view_all(".a.aa.aaa", rx) regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE, dotall = FALSE, ...) regexp matches example regexp matches example Modifies a regex to ignore cases, match end of ab|d or alt("ab|d") abcde a? zero or one quant("a?") .a.aa.aaa lines as well of end of strings, allow R comments [abe] one of alt("[abe]") abcde a* zero or more quant("a*") .a.aa.aaa within regex's , and/or to have . match everything a+ one or more quant("a+") .a.aa.aaa including \n. [âbe] anything but alt("[âbe]") abcde str_detect("I", regex("i", TRUE)) [a-c] range alt("[a-c]") abcde 1 2 ... n a{n} exactly n quant("a{2}") .a.aa.aaa 1 2 ... n a{n, } n or more quant("a{2,}") .a.aa.aaa fixed() Matches raw bytes but will miss some n ... m a{n, m} between n and m quant("a{2,4}") .a.aa.aaa characters that can be represented in multiple ANCHORS anchor <- function(rx) str_view_all("aaa", rx) ways (fast). str_detect("\u0130", fixed("i")) regexp matches example ref <- function(rx) str_view_all("abbaab", rx) coll() Matches raw bytes and will use locale â start of string anchor("â") aaa GROUPS specific collation rules to recognize characters a$ end of string anchor("a$") aaa Use parentheses to set precedent (order of evaluation) and create groups that can be represented in multiple ways (slow). str_detect("\u0130", coll("i", TRUE, locale = "tr")) regexp matches example (ab|d)e sets precedence alt("(ab|d)e") abcde LOOK AROUNDS look <- function(rx) str_view_all("bacad", rx) boundary() Matches boundaries between characters, line_breaks, sentences, or words. regexp matches example Use an escaped number to refer to and duplicate parentheses groups that occur str_split(sentences, boundary("word")) a(?=c) followed by look("a(?=c)") bacad earlier in a pattern. Refer to each group by its order of appearance a(?!c) not followed by look("a(?!c)") bacad string regexp matches example (?<=b)a preceded by look("(?<=b)a") bacad (type this) (to mean this) (which matches this) (the result is the same as ref("abba")) (?

Date-times 2017-11-28 12:00:00 2017-11-28 12:00:00 Round Date-times A date-time is a point on the timeline, A date is a day stored as An hms is a time stored as floor_date(x, unit = "second") stored as the number of seconds since the number of days since the number of seconds since 1970-01-01 00:00:00 UTC 1970-01-01 00:00:00 Round down to nearest unit. 2016 2017 2018 2019 2020 floor_date(dt, unit = "month") Jan Feb Mar Apr 2017-11-28 12:00:00 dt <- as_datetime(1511870400) d <- as_date(17498) t <- hms::as.hms(85) round_date(x, unit = "second") ## "2017-11-28 12:00:00 UTC" ## "2017-11-28" ## 00:01:25 Round to nearest unit. round_date(dt, unit = "month") Jan Feb Mar Apr ceiling_date(x, unit = "second", PARSE DATE-TIMES (Convert strings or numbers to date-times) GET AND SET COMPONENTS d ## "2017-11-28" change_on_boundary = NULL) 1. Identify the order of the year (y), month (m), day (d), hour (h), Use an accessor function to get a component. day(d) ## 28 Round up to nearest unit. Jan Feb Mar Apr ceiling_date(dt, unit = "month") minute (m) and second (s) elements in your data. Assign into an accessor function to change a day(d) <- 1 2. Use the function below whose name replicates the order. Each component in place. d ## "2017-11-01" rollback(dates, roll_to_first = accepts a wide variety of input formats. FALSE, preserve_hms = TRUE) Roll back to last day of previous month. rollback(dt) ymd_hms(), ymd_hm(), ymd_h(). 2018-01-31 11:59:59 date(x) Date component. date(dt) 2017-11-28T14:02:00 ymd_hms("2017-11-28T14:02:00") 2018-01-31 11:59:59 year(x) Year. year(dt) ydm_hms(), ydm_hm(), ydm_h(). isoyear(x) The ISO 8601 year. 2017-22-12 10:00:00 Stamp Date-times ydm_hms("2017-22-12 10:00:00") epiyear(x) Epidemiological year. stamp() Derive a template from an example string and return a new mdy_hms(), mdy_hm(), mdy_h(). function that will apply the template to date-times. Also 11/28/2017 1:02:03 2018-01-31 11:59:59 month(x, label, abbr) Month. mdy_hms("11/28/2017 1:02:03") month(dt) stamp_date() and stamp_time(). 2018-01-31 11:59:59 1. Derive a template, create a function dmy_hms(), dmy_hm(), dmy_h(). day(x) Day of month. day(dt) Tip: use a 1 Jan 2017 23:59:59 dmy_hms("1 Jan 2017 23:59:59") sf <- stamp("Created Sunday, Jan 17, 1999 3:34") wday(x,label,abbr) Day of week. date with 2. Apply the template to dates day > 12 ymd(), ydm(). ymd(20170131) qday(x) Day of quarter. 20170131 sf(ymd("2010-04-05")) 2018-01-31 11:59:59 ## [1] "Created Monday, Apr 05, 2010 00:00" mdy(), myd(). mdy("July 4th, 2000") hour(x) Hour. hour(dt) July 4th, 2000 2018-01-31 11:59:59 dmy(), dym(). dmy("4th of July '99") minute(x) Minutes. minute(dt) 4th of July '99 yq() Q for quarter. yq("2001: Q3") 2018-01-31 11:59:59 second(x) Seconds. second(dt) 2001: Q3 Time Zones hms::hms() Also lubridate::hms(), week(x) Week of the year. week(dt) R recognizes ~600 time zones. Each encodes the time zone, Daylight 2:01 J F M A M J hm() and ms(), which return xJ A S O N D isoweek() ISO 8601 week. Savings Time, and historical calendar variations for an area. R assigns periods.* hms::hms(sec = 0, min= 1, epiweek() Epidemiological week. one time zone per vector. hours = 2) J F M A M J Use the UTC time zone to avoid Daylight Savings. x quarter(x, with_year = FALSE) J A S O N D Quarter. quarter(dt) OlsonNames() Returns a list of valid time zone names. OlsonNames() date_decimal(decimal, tz = "UTC") 2017.5 date_decimal(2017.5) J F M A M J semester(x, with_year = FALSE) 5:00 6:00 xJ A S O N D Semester. semester(dt) now(tzone = "") Current time in tz 4:00 Mountain Central 7:00 (defaults to system tz). now() with_tz(time, tzone = "") Get am(x) Is it in the am? am(dt) Pacific Eastern the same date-time in a new pm(x) Is it in the pm? pm(dt) time zone (a new clock time). January today(tzone = "") Current date in a xxxxx tz (defaults to system tz). today() with_tz(dt, "US/Pacific") xxx dst(x) Is it daylight savings? dst(d) PT MT ET fast_strptime() Faster strptime. CT fast_strptime('9/1/01', '%y/%m/%d') leap_year(x) Is it a leap year? force_tz(time, tzone = "") Get leap_year(d) 7:00 7:00 the same clock time in a new parse_date_time() Easier strptime. Pacific Eastern time zone (a new date-time). parse_date_time("9/1/01", "ymd") update(object, ..., simple = FALSE) force_tz(dt, "US/Pacific") update(dt, mday = 2, hour = 1) 7:00 7:00 Mountain Central

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at lubridate.tidyverse.org • lubridate 1.6.0 • Updated: 2017-12 Math with Date-times — Lubridate provides three classes of timespans to facilitate math with dates and date-times Math with date-times relies on the timeline, Periods track changes in clock times, Durations track the passage of Intervals represent specific intervals Not all years which behaves inconsistently. Consider how which ignore time line irregularities. physical time, which deviates from of the timeline, bounded by start and are 365 days the timeline behaves during: clock time when irregularities occur. end date-times. due to leap days. A normal day nor + minutes(90) nor + dminutes(90) interval(nor, nor + minutes(90)) Not all minutes nor <- ymd_hms("2018-01-01 01:30:00",tz="US/Eastern") are 60 seconds due to leap seconds. It is possible to create an imaginary date 1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00 by adding months, e.g. February 31st The start of daylight savings (spring forward) gap + minutes(90) gap + dminutes(90) interval(gap, gap + minutes(90)) jan31 <- ymd(20180131) gap <- ymd_hms("2018-03-11 01:30:00",tz="US/Eastern") jan31 + months(1) ## NA

1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00 1:00 2:00 3:00 4:00 %m+% and %m-% will roll imaginary dates to the last day of the previous The end of daylight savings (fall back) month. lap + minutes(90) lap + dminutes(90) interval(lap, lap + minutes(90)) lap <- ymd_hms("2018-11-04 00:30:00",tz="US/Eastern") jan31 %m+% months(1) ## "2018-02-28" add_with_rollback(e1, e2, roll_to_first = 12:00 1:00 2:00 3:00 12:00 1:00 2:00 3:00 12:00 1:00 2:00 3:00 12:00 1:00 2:00 3:00 TRUE) will roll imaginary dates to the first day of the new month. Leap years and leap seconds leap + years(1) leap + dyears(1) interval(leap, leap + years(1)) leap <- ymd("2019-03-01") add_with_rollback(jan31, months(1), roll_to_first = TRUE) ## "2018-03-01" 2019 2020 2021 2019 2020 2021 2019 2020 2021 2019 2020 2021

PERIODS DURATIONS INTERVALS Add or subtract periods to model events that happen at specific clock Add or subtract durations to model physical processes, like battery life. Divide an interval by a duration to determine its physical length, divide times, like the NYSE opening bell. Durations are stored as seconds, the only time unit with a consistent length. an interval by a period to determine its implied length in clock time. Difftimes are a class of durations found in base R. Start End Make a period with the name of a time unit pluralized, e.g. Make a duration with the name of a period prefixed with a d, e.g. Make an interval with interval() or %--%, e.g. Date Date p <- months(3) + days(12) years(x = 1) x years. dd <- ddays(14) dyears(x = 1) 31536000x seconds. i <- interval(ymd("2017-01-01"), d) ## 2017-01-01 UTC--2017-11-28 UTC p months(x) x months. dd dweeks(x = 1) 604800x seconds. j <- d %--% ymd("2017-12-31") ## 2017-11-28 UTC--2017-12-31 UTC "3m 12d 0H 0M 0S" "1209600s (~2 weeks)" weeks(x = 1) x weeks. ddays(x = 1) 86400x seconds. a %within% b Does interval or date-time a fall days(x = 1) x days. dhours(x = 1) 3600x seconds. Number Number Exact Equivalent within interval b? now() %within% i of months of days etc. hours(x = 1) x hours. length in in common dminutes(x = 1) 60x seconds. minutes(x = 1) x minutes. seconds units dseconds(x = 1) x seconds. int_start(int) Access/set the start date-time of seconds(x = 1) x seconds. dmilliseconds(x = 1) x x 10-3 seconds. an interval. Also int_end(). int_start(i) <- now(); int_start(i) milliseconds(x = 1) x milliseconds. dmicroseconds(x = 1) x x 10-6 seconds. microseconds(x = 1) x microseconds dnanoseconds(x = 1) x x 10-9 seconds. int_aligns(int1, int2) Do two intervals share a nanoseconds(x = 1) x nanoseconds. dpicoseconds(x = 1) x x 10-12 seconds. boundary? Also int_overlaps(). int_aligns(i, j) picoseconds(x = 1) x picoseconds. duration(num = NULL, units = "second", ...) int_diff(times) Make the intervals that occur period(num = NULL, units = "second", ...) An automation friendly duration between the date-times in a vector. An automation friendly period constructor. constructor. duration(5, unit = "years") v <-c(dt, dt + 100, dt + 1000); int_diff(v) period(5, unit = "years") as.duration(x, …) Coerce a timespan to a int_flip(int) Reverse the direction of an as.period(x, unit) Coerce a timespan to a duration. Also is.duration(), is.difftime(). interval. Also int_standardize(). int_flip(i) period, optionally in the specified units. as.duration(i) Also is.period(). as.period(i) l int_length(int) Length in seconds. int_length(i) make_difftime(x) Make difftime with the period_to_seconds(x) Convert a period to specified number of units. int_shift(int, by) Shifts an interval up or down the "standard" number of seconds implied make_difftime(99999) by the period. Also seconds_to_period(). the timeline by a timespan. int_shift(i, days(-1)) period_to_seconds(p) as.interval(x, start, …) Coerce a timespans to an interval with the start date-time. Also is.interval(). as.interval(days(1), start = now())

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at lubridate.tidyverse.org • lubridate 1.6.0 • Updated: 2017-12 Package Development: : CHEAT SHEET

Package Structure Setup (� DESCRIPTION) A package is a convention for organizing files into directories. The � DESCRIPTION file describes your work, sets up how your Package: mypackage package will work with other packages, and applies a copyright. Title: Title of Package This sheet shows how to work with the 7 most common parts of Version: 0.1.0 You must have a DESCRIPTION file Authors@R: person("Hadley", "Wickham", email = an R package: � "[email protected]", role = c("aut", "cre")) Add the packages that yours relies on with Description: What the package does (one paragraph) � Package � Depends: R (>= 3.1.0) devtools::use_package() Import packages that your package � SETUP License: GPL-2 DESCRIPTION Adds a package to the Imports or Suggests field must have to work. R will install them � LazyData: true R/ WRITE CODE Imports: when it installs your package. � TEST CC0 MIT GPL-2 dplyr (>= 0.4.0), tests/ Suggest packages that are not very MIT license applies to GPL-2 license applies to your ggvis (>= 0.2) � man/ DOCUMENT No strings attached. essential to yours. Users can install and all code anyone Suggests: � TEACH your code if re-shared. code, them manually, or not, as they like. vignettes/ bundles with it, if re-shared. knitr (>= 0.1.0) � data/ ADD DATA � NAMESPACE ORGANIZE Write Code ( � R/) Test ( � /) The contents of a package can be stored on disk as a: tests All of the R code in your package goes in � R/. A package with just Use � tests/ to store tests that will alert you if your code breaks. • source - a directory with sub-directories (as above) an R/ directory is still a very useful package. • bundle - a single compressed file (.tar.gz) • binary - a single compressed file optimized for a specific OS � Create a new package project with � Add a tests/ directory Or installed into an R library (loaded into memory during an R devtools::create("path/to/name") � Import testthat with devtools::use_testthat(), which session) or archived online in a repository. Use the functions Create a template to develop into a package. sets up package to use automated tests with testthat below to move between these states. Save your code in � R/ as scripts (extension .R) � � Write tests with context(), test(), and expect statements

WORKFLOW � Save your tests as .R files in tests/testthat/ WORKFLOW

Repository Repository 1. Edit your code. Source Bundle Binary Installed In memory install.packages() CRAN ○ 2. Load your code with one of 1. Modify your code or tests. Example Test install.packages(type = "source") CRAN ○ devtools::load_all() 2. Test your code with one of ○ ○ Re-loads all saved files in � R/ into memory. devtools::test() context("Arithmetic") � R CMD install ○ ○ Ctrl/Cmd + Shift + L (keyboard shortcut) Runs all tests in tests/ test_that("Math works", { ○ ○ expect_equal(1 + 1, 2) Saves all open files then calls load_all(). Ctrl/Cmd + Shift + T expect_equal(1 + 2, 3) devtools::install() ○ (keyboard shortcut) expect_equal(1 + 3, 4) 3. Experiment in the console. devtools::build() 3. Repeat until all tests pass }) ○ ○ 4. Repeat. devtools::install_github() github ○ Expect statement Tests devtools::load_all() ○ ○ • Use consistent style with r-pkgs.had.co.nz/r.html#style expect_equal() is equal within small numerical tolerance? Build & Reload (RStudio) ○ ○ ○ • Click on a function and press F2 to open its definition expect_identical() is exactly equal? library() ○ ○ • Search for a function with Ctrl + . expect_match() matches specified string or regular expect_output() expression?prints specified output? Internet On disk library memory Visit r-pkgs.had.co.nz to expect_message() displays specified message? devtools::use_build_ignore("file") learn much more about expect_warning() displays specified warning? Adds file to .Rbuildignore, a list of files that will not be included writing and publishing expect_error() throws specified error? when package is built. packages for R expect_is() output inherits from certain class? expect_false() returns FALSE? expect_true() returns TRUE?

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at http://r-pkgs.had.co.nz/ • devtools 1.5.1 • Updated: 2015-01 Document ( man/) Add Data ( data/) man/ contains the documentation for your functions, the help The data/ directory allows you to pages in your package. ROXYGEN2 include data with your package. Use roxygen comments to document each function The roxygen2 package lets you write beside its definition documentation inline in your .R files with a Save data as .Rdata files (suggested) shorthand syntax. devtools implements Store data in one of data/, R/Sysdata.rda, inst/extdata Document the name of each exported data set roxygen2 to make documentation. Always use LazyData: true in your DESCRIPTION file. Include helpful examples for each function • Add roxygen documentation as comment lines that begin with #’. devtools::use_data() WORKFLOW • Place comment lines directly above the code that defines the Adds a data object to data/ object documented. (R/Sysdata.rda if internal = TRUE) 1. Add roxygen comments in your .R files • Place a roxygen @ tag (right) after #’ to supply a specific 2. Convert roxygen comments into documentation with one of: section of documentation. devtools::use_data_raw() Adds an R Script used to clean a data set to data-raw/. devtools::document() • Untagged lines will be used to generate a title, description, Includes data-raw/ on .Rbuildignore. Converts roxygen comments to .Rd files and places and details section (in that order) them in man/. Builds NAMESPACE. Store data in #' Add together two numbers. Ctrl/Cmd + Shift + D (Keyboard Shortcut) #' • data/ to make data available to package users 3. Open help pages with ? to preview documentation #' @param x A number. • R/sysdata.rda to keep data internal for use by your #' @param y A number. functions. 4. Repeat #' @return The sum of \code{x} and \code{y}. • inst/extdata to make raw data available for loading and #' @examples parsing examples. Access this data with system.file() .Rd FORMATTING TAGS #' add(1, 1) #' @export \emph{italic text} \email{name@@foo.com} add <- function(x, y) { x + y \strong{bold text} \href{url}{display} } \code{function(args)} \url{url} Organize ( NAMESPACE) \pkg{package} \link[=dest]{display} COMMON ROXYGEN TAGS The NAMESPACE file helps you make your package self- contained: it won’t interfere with other packages, and other \dontrun{code} \linkS4class{class} @aliases @inheritParams @seealso packages won’t interfere with it. \dontshow{code} \code{\link{function}} @concepts @keywords @format \donttest{code} \code{\link[package]{function}} @describeIn @param @source data Export functions for users by placing @export in their \deqn{a + b (block)} \tabular{lcr}{ @examples @rdname @include roxygen comments \eqn{a + b (inline)} left \tab centered \tab right \cr @export @return @slot S4 Import objects from other packages with cell \tab cell \tab cell \cr @family @section @field RC package::object (recommended) or @import, } @importFrom, @importClassesFrom, @importMethodsFrom (not always recommended) Teach ( vignettes/) vignettes/ holds documents that teach your users how to solve real problems with your tools. WORKFLOW Create a vignettes/ directory and a template vignette with --- title: "Vignette Title" 1. Modify your code or tests. devtools::use_vignette() author: "Vignette Author" 2. Document your package (devtools::document()) Adds template vignette as vignettes/my-vignette.Rmd. date: "`r Sys.Date()`" Check NAMESPACE Append YAML headers to your vignettes (like right) output: rmarkdown::html_vignette 3. vignette: > 4. Repeat until NAMESPACE is correct Write the body of your vignettes in R Markdown %\VignetteIndexEntry{Vignette Title} (rmarkdown.rstudio.com) %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} SUBMIT YOUR PACKAGE --- r-pkgs.had.co.nz/release.html

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at http://r-pkgs.had.co.nz/ • devtools 1.5.1 • Updated: 2015-01 Deep Learning with Keras : : CHEAT SHEET Keras TensorFlow Intro INSTALLATION Define Compile Fit Evaluate Predict The keras R package uses the Python keras library. Keras is a high-level neural networks API You can install all the prerequisites directly from R. developed with a focus on enabling fast • Model • Batch size https://keras.rstudio.com/reference/install_keras.html experimentation. It supports multiple back- • Sequential • Optimiser • Epochs • Evaluate • classes model • Loss • Validation • Plot • probability ends, including TensorFlow, CNTK and Theano. library(keras) See ?install_keras • Multi-GPU • Metrics split install_keras() for GPU instructions TensorFlow is a lower level mathematical model library for building deep neural network https://keras.rstudio.com This installs the required libraries in an Anaconda architectures. The keras R package makes it The “Hello, World!” environment or virtual environment 'r-tensorflow'. easy to use Keras and TensorFlow in R. https://www.manning.com/books/deep-learning-with-r of deep learning TRAINING AN IMAGE RECOGNIZER ON MNIST DATA Working with keras models # input layer: use MNIST images DEFINE A MODEL PREDICT CORE LAYERS mnist <- dataset_mnist() x_train <- mnist$train$x; y_train <- mnist$train$y keras_model() Keras Model predict() Generate predictions from a Keras model layer_input() Input layer x_test <- mnist$test$x; y_test <- mnist$test$y keras_model_sequential() Keras Model composed of a linear stack of layers predict_proba() and predict_classes() Generates probability or class probability predictions layer_dense() Add a densely- # reshape and rescale connected NN layer to an output multi_gpu_model() Replicates a model on different for the input samples x_train <- array_reshape(x_train, c(nrow(x_train), 784)) GPUs x_test <- array_reshape(x_test, c(nrow(x_test), 784)) predict_on_batch() Returns predictions for a single layer_activation() Apply an x_train <- x_train / 255; x_test <- x_test / 255 batch of samples activation function to an output COMPILE A MODEL y_train <- to_categorical(y_train, 10) predict_generator() Generates predictions for the layer_dropout() Applies Dropout y_test <- to_categorical(y_test, 10) compile(object, optimizer, loss, metrics = NULL) input samples from a data generator to the input Configure a Keras model for training layer_reshape() Reshapes an # defining the model and layers output to a certain shape model <- keras_model_sequential() FIT A MODEL OTHER MODEL OPERATIONS model %>% layer_dense(units = 256, activation = 'relu', fit(object, x = NULL, y = NULL, batch_size = NULL, summary() Print a summary of a Keras model layer_permute() Permute the input_shape = c(784)) %>% epochs = 10, verbose = 1, callbacks = NULL, …) dimensions of an input according Train a Keras model for a fixed number of epochs to a given pattern layer_dropout(rate = 0.4) %>% (iterations) export_savedmodel() Export a saved model layer_dense(units = 128, activation = 'relu') %>% n layer_repeat_vector() Repeats layer_dense(units = 10, activation = 'softmax’) fit_generator() Fits the model on data yielded batch- get_layer() Retrieves a layer based on either its the input n times name (unique) or index by-batch by a generator # compile (define loss and optimizer) pop_layer() Remove the last layer in a model x f(x) layer_lambda(object, f) Wraps model %>% compile( train_on_batch() test_on_batch() Single gradient arbitrary expression as a layer update or model evaluation over one batch of loss = 'categorical_crossentropy', samples save_model_hdf5(); load_model_hdf5() Save/ optimizer = optimizer_rmsprop(), L1 L2 layer_activity_regularization() Load models using HDF5 files Layer that applies an update to metrics = c('accuracy’) the cost function based input ) EVALUATE A MODEL serialize_model(); unserialize_model() activity Serialize a model to an R object # train (fit) layer_masking() Masks a model %>% fit( evaluate(object, x = NULL, y = NULL, batch_size = sequence by using a mask value to clone_model() Clone a model instance x_train, y_train, NULL) Evaluate a Keras model skip timesteps epochs = 30, batch_size = 128, freeze_weights(); unfreeze_weights() evaluate_generator() Evaluates the model on a data validation_split = 0.2 Freeze and unfreeze weights layer_flatten() Flattens an input generator ) model %>% evaluate(x_test, y_test) model %>% predict_classes(x_test)

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at keras.rstudio.com • keras 2.1.2 • Updated: 2017-12 More layers Preprocessing

CONVOLUTIONAL LAYERS ACTIVATION LAYERS SEQUENCE PREPROCESSING Keras TensorFlow layer_conv_1d() 1D, e.g. layer_activation(object, activation) pad_sequences() temporal convolution Apply an activation function to an output Pads each sequence to the same length (length of Pre-trained models the longest sequence) layer_activation_leaky_relu() Keras applications are deep learning models layer_conv_2d_transpose() Leaky version of a rectified linear unit skipgrams() that are made available alongside pre-trained Transposed 2D (deconvolution) Generates skipgram word pairs weights. These models can be used for α layer_activation_parametric_relu() prediction, feature extraction, and fine-tuning. layer_conv_2d() 2D, e.g. spatial Parametric rectified linear unit make_sampling_table() application_xception() convolution over images Generates word rank-based probabilistic sampling xception_preprocess_input() layer_activation_thresholded_relu() table Xception v1 model Thresholded rectified linear unit layer_conv_3d_transpose() Transposed 3D (deconvolution) layer_activation_elu() TEXT PREPROCESSING application_inception_v3() inception_v3_preprocess_input() layer_conv_3d() 3D, e.g. spatial Exponential linear unit text_tokenizer() Text tokenization utility convolution over volumes Inception v3 model, with weights pre-trained on ImageNet fit_text_tokenizer() Update tokenizer internal layer_conv_lstm_2d() vocabulary Convolutional LSTM DROPOUT LAYERS application_inception_resnet_v2() save_text_tokenizer(); load_text_tokenizer() inception_resnet_v2_preprocess_input() layer_separable_conv_2d() layer_dropout() Save a text tokenizer to an external file Inception-ResNet v2 model, with weights Depthwise separable 2D Applies dropout to the input trained on ImageNet texts_to_sequences(); layer_upsampling_1d() layer_spatial_dropout_1d() layer_upsampling_2d() texts_to_sequences_generator() application_vgg16(); application_vgg19() layer_spatial_dropout_2d() Transforms each text in texts to sequence of integers VGG16 and VGG19 models layer_upsampling_3d() layer_spatial_dropout_3d() Upsampling layer Spatial 1D to 3D version of dropout texts_to_matrix(); sequences_to_matrix() application_resnet50() ResNet50 model layer_zero_padding_1d() Convert a list of sequences into a matrix layer_zero_padding_2d() RECURRENT LAYERS One-hot encode text to word indices application_mobilenet() layer_zero_padding_3d() text_one_hot() mobilenet_preprocess_input() Zero-padding layer layer_simple_rnn() mobilenet_decode_predictions() text_hashing_trick() mobilenet_load_model_hdf5() layer_cropping_1d() Fully-connected RNN where the output Converts a text to a sequence of indexes in a fixed- MobileNet model architecture layer_cropping_2d() is to be fed back to input size hashing space layer_cropping_3d() Cropping layer layer_gru() text_to_word_sequence() Gated recurrent unit - Cho et al Convert text to a sequence of words (or tokens) ImageNet is a large database of images with POOLING LAYERS layer_cudnn_gru() labels, extensively used for deep learning layer_max_pooling_1d() Fast GRU implementation backed IMAGE PREPROCESSING layer_max_pooling_2d() by CuDNN imagenet_preprocess_input() layer_max_pooling_3d() image_load() Loads an image into PIL format. imagenet_decode_predictions() Maximum pooling for 1D to 3D layer_lstm() Preprocesses a tensor encoding a batch of Long-Short Term Memory unit - flow_images_from_data() images for ImageNet, and decodes predictions layer_average_pooling_1d() Hochreiter 1997 flow_images_from_directory() layer_average_pooling_2d() Generates batches of augmented/normalized data layer_average_pooling_3d() layer_cudnn_lstm() from images and labels, or a directory Callbacks Average pooling for 1D to 3D Fast LSTM implementation backed by CuDNN image_data_generator() Generate minibatches of A callback is a set of functions to be applied at layer_global_max_pooling_1d() given stages of the training procedure. You can layer_global_max_pooling_2d() image data with real-time data augmentation. LOCALLY CONNECTED LAYERS use callbacks to get a view on internal states layer_global_max_pooling_3d() and statistics of the model during training. Global maximum pooling fit_image_data_generator() Fit image data layer_locally_connected_1d() generator internal statistics to some sample data callback_early_stopping() Stop training when layer_global_average_pooling_1d() layer_locally_connected_2d() a monitored quantity has stopped improving layer_global_average_pooling_2d() Similar to convolution, but weights are not generator_next() Retrieve the next item Learning layer_global_average_pooling_3d() shared, i.e. different filters for each patch callback_learning_rate_scheduler() Global average pooling rate scheduler image_to_array(); image_array_resize() callback_tensorboard() TensorBoard basic image_array_save() 3D array representation visualizations RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at keras.rstudio.com • keras 2.1.2 • Updated: 2017-12 Data Science in Spark with Sparklyr : : CHEAT SHEET Intro Data Science Toolchain with Spark + sparklyr Using sparklyr is an R interface for Apache Spark™, Understand sparklyr it provides a complete dplyr backend and the Import Tidy Transform Visualize Communicate option to query directly using Spark SQL • Export an R • dplyr verb Transformer Collect data into • Collect data A brief example of a data analysis using statement. With sparklyr, you can orchestrate DataFrame • Direct Spark function R for plotting into R Apache Spark, R and sparklyr in local mode distributed machine learning using either • Read a file SQL (DBI) • Share plots, Spark’s MLlib or H2O Sparkling Water. • Read existing • SDF function Wrangle Model documents, library(sparklyr); library(dplyr); library(ggplot2); Hive table (Scala API) • Spark MLlib and apps library(tidyr); Install Spark locally Starting with version 1.044, RStudio Desktop, • R for Data Science, Grolemund & Wickham H2O Extension set.seed(100) Server and Pro include integrated support for the sparklyr package. You can create and spark_install("2.0.1") Connect to local version manage connections to Spark clusters and local Getting Started Spark instances from inside the IDE. sc <- spark_connect(master = "local") LOCAL MODE (No cluster required) ON A YARN MANAGED CLUSTER RStudio Integrates with sparklyr 1. Install a local version of Spark: 1. Install RStudio Server or RStudio Pro on import_iris <- copy_to(sc, iris, "spark_iris", Open connection log Disconnect spark_install ("2.0.1") one of the existing nodes, preferably an overwrite = TRUE) 2. Open a connection edge node Copy data to Spark memory sc <- spark_connect (master = "local") 2. Locate path to the cluster’s Spark Home Directory, it normally is “/usr/lib/spark” partition_iris <- sdf_partition( Partition import_iris,training=0.5, testing=0.5) 3. Open a connection data Open the ON A MESOS MANAGED CLUSTER spark_connect(master=“yarn-client”, Spark UI sdf_register(partition_iris, 1. Install RStudio Server or Pro on one of the version = “1.6.2”, spark_home = c("spark_iris_training","spark_iris_test")) Preview existing nodes [Cluster’s Spark path]) 1K rows Spark & Hive Tables 2. Locate path to the cluster’s Spark directory Create a hive metadata for each partition 3. Open a connection spark_connect(master=“[mesos URL]”, tidy_iris <- tbl(sc,"spark_iris_training") %>% Cluster Deployment ON A SPARK STANDALONE CLUSTER version = “1.6.2”, spark_home = select(Species, Petal_Length, Petal_Width) [Cluster’s Spark path]) MANAGED CLUSTER 1. Install RStudio Server or RStudio Pro on Worker Nodes Spark ML one of the existing nodes or a server in the Decision Tree Cluster Manager same LAN Driver Node model_iris <- tidy_iris %>% Model USING LIVY (Experimental) fd 2. Install a local version of Spark: ml_decision_tree(response="Species", 1. The Livy REST application should be spark_install (version = “2.0.1") features=c("Petal_Length","Petal_Width")) YARN running on the cluster 3. Open a connection or fd Mesos 2. Connect to the cluster spark_connect(master=“spark:// test_iris <- tbl(sc,"spark_iris_test") Create sc <- spark_connect(method = "livy", host:port“, version = "2.0.1", reference to spark_home = spark_home_dir()) Spark table fd master = "http://host:port") pred_iris <- sdf_predict( model_iris, test_iris) %>% STAND ALONE CLUSTER collect Bring data back Worker Nodes into R memory Tuning Spark for plotting Driver Node pred_iris %>% fd EXAMPLE CONFIGURATION IMPORTANT TUNING PARAMETERS with defaults inner_join(data.frame(prediction=0:2, config <- spark_config() • spark.yarn.am.cores • spark.executor.instances lab=model_iris$model.parameters$labels)) %>% fd config$spark.executor.cores <- 2 • spark.yarn.am.memory 512m • spark.executor.extraJavaOptions ggplot(aes(Petal_Length, Petal_Width, col=lab)) + config$spark.executor.memory <- "4G" • spark.network.timeout 120s • spark.executor.heartbeatInterval 10s geom_point() fd sc <- spark_connect (master="yarn-client", • spark.executor.memory 1g • sparklyr.shell.executor-memory config = config, version = "2.0.1") • spark.executor.cores 1 • sparklyr.shell.driver-memory spark_disconnect(sc) Disconnect

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at spark.rstudio.com • sparklyr 0.5 • Updated: 2016-12 Reactivity Visualize & Communicate Model (MLlib) COPY A DATA FRAME INTO SPARK SPARK SQL COMMANDS DOWNLOAD DATA TO R MEMORY ml_decision_tree(my_table, r_table <- collect(my_table) sdf_copy_to(sc, iris, "spark_iris") DBI::dbWriteTable(sc, "spark_iris", iris) response = “Species", features = plot(Petal_Width~Petal_Length, data=r_table) c(“Petal_Length" , "Petal_Width")) sdf_copy_to(sc, x, name, memory, repartition, DBI::dbWriteTable(conn, name, dplyr::collect(x) overwrite) value) Download a Spark DataFrame to an R DataFrame ml_als_factorization(x, user.column = "user", sdf_read_column(x, column) rating.column = "rating", item.column = "item", IMPORT INTO SPARK FROM A FILE FROM A TABLE IN HIVE Returns contents of a single column to R rank = 10L, regularization.parameter = 0.1, iter.max = 10L, Arguments that apply to all functions: my_var <- tbl_cache(sc, name= ml.options = ml_options()) sc, name, path, options = list(), repartition = 0, "hive_iris") SAVE FROM SPARK TO FILE SYSTEM ml_decision_tree(x, response, features, max.bins = 32L, max.depth memory = TRUE, overwrite = TRUE tbl_cache(sc, name, force = TRUE) Arguments that apply to all functions: x, path = 5L, type = c("auto", "regression", "classification"), ml.options = CSV spark_read_csv( header = TRUE, Loads the table into memory spark_read_csv( header = TRUE, ml_options()) Same options for: ml_gradient_boosted_trees columns = NULL, infer_schema = TRUE, CSV delimiter = ",", quote = "\"", escape = "\\", x, response, features, delimiter = ",", quote = "\"", escape = "\\", my_var <- dplyr::tbl(sc, ml_generalized_linear_regression( charset = "UTF-8", null_value = NULL) intercept = TRUE, family = gaussian(link = "identity"), iter.max = charset = "UTF-8", null_value = NULL) name= "hive_iris") dplyr::tbl(scr, …) JSON spark_read_json(mode = NULL) 100L, ml.options = ml_options()) JSON spark_read_json() Creates a reference to the table PARQUET spark_read_parquet(mode = NULL) ml_kmeans(x, centers, iter.max = 100, features = dplyr::tbl_vars(x), PARQUET spark_read_parquet() without loading it into memory compute.cost = TRUE, tolerance = 1e-04, ml.options = ml_options()) ml_lda(x, features = dplyr::tbl_vars(x), k = length(features), alpha = Wrangle Reading & Writing from Apache Spark (50/k) + 1, beta = 0.1 + 1, ml.options = ml_options()) ml_linear_regression(x, response, features, intercept = TRUE, SPARK SQL VIA DPLYR VERBS ML TRANSFORMERS tbl_cache sdf_copy_to alpha = 0, lambda = 0, iter.max = 100L, ml.options = ml_options()) dplyr:: Translates into Spark SQL statements ft_binarizer(my_table,input.col=“Petal_Le dplyr::copy_to tbl Same options for: ml_logistic_regression ngth”, output.col="petal_large", my_table <- my_var %>% DBI::dbWriteTable x, response, features, layers, iter.max = threshold=1.2) ml_multilayer_perceptron( filter(Species=="setosa") %>% 100, seed = sample(.Machine$integer.max, 1), ml.options = sample_n(10) Arguments that apply to all functions: ml_options()) spark_read_ x, input.col = NULL, output.col = NULL sdf_collect File ml_naive_bayes(x, response, features, lambda = 0, ml.options = DIRECT SPARK SQL COMMANDS ft_binarizer(threshold = 0.5) dplyr::collect ml_options()) sdf_read_column System my_table <- DBI::dbGetQuery( sc , ”SELECT * Assigned values based on threshold spark_write_ ml_one_vs_rest(x, classifier, response, features, ml.options = FROM iris LIMIT 10") ft_bucketizer(splits) ml_options()) Numeric column to discretized column DBI::dbGetQuery(conn, statement) ml_pca(x, features = dplyr::tbl_vars(x), ml.options = ml_options()) ft_discrete_cosine_transform(inverse Extensions Create an R package that calls the full Spark API & ml_random_forest(x, response, features, max.bins = 32L, = FALSE) SCALA API VIA SDF FUNCTIONS provide interfaces to Spark packages. max.depth = 5L, num.trees = 20L, type = c("auto", "regression", Time domain to frequency domain sdf_mutate(.data) "classification"), ml.options = ml_options()) ft_elementwise_product(scaling.col) CORE TYPES Works like dplyr mutate function spark_connection() Connection between R and the Element-wise product between 2 cols ml_survival_regression(x, response, features, intercept = sdf_partition(x, ..., weights = NULL, seed = Spark shell process TRUE,censor = "censor", iter.max = 100L, ml.options = ml_options()) sample (.Machine$integer.max, 1)) ft_index_to_string() spark_jobj() Instance of a remote Spark object ml_binary_classification_eval(predicted_tbl_spark, label, score, sdf_partition(x, training = 0.5, test = 0.5) Index labels back to label as strings spark_dataframe() Instance of a remote Spark metric = "areaUnderROC") sdf_register(x, name = NULL) ft_one_hot_encoder() DataFrame object ml_classification_eval(predicted_tbl_spark, label, predicted_lbl, Gives a Spark DataFrame a table name Continuous to binary vectors CALL SPARK FROM R metric = "f1") sdf_sample(x, fraction = 1, replacement = ft_quantile_discretizer(n.buckets=5L) invoke() Call a method on a Java object Continuous to binned categorical ml_tree_feature_importance(sc, model) TRUE, seed = NULL) invoke_new() Create a new object by invoking a values sdf_sort(x, columns) constructor Sorts by >=1 columns in ascending order ft_sql_transformer(sql) invoke_static() Call a static method on an object sparklyr sdf_with_unique_id(x, id = "id") ft_string_indexer( params = NULL) is an R Column of labels into a column of label MACHINE LEARNING EXTENSIONS sdf_predict(object, newdata) interface indices. ml_create_dummy_variables() ml_options() Spark DataFrame with predicted values for ft_vector_assembler() ml_prepare_dataframe() ml_model() Combine vectors into single row-vector ml_prepare_response_features_intercept()

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at spark.rstudio.com • sparklyr 0.5 • Updated: 2016-12 Tidy evaluation with rlang : : CHEAT SHEET

Vocabulary Quoting Code a + b e 3 a + b Tidy Evaluation (Tidy Eval) is not a package, but a framework Quote code in one of two ways (if in doubt use a quosure): for doing non-standard evaluation (i.e. delayed evaluation) that makes it easier to program with tidyverse functions. QUOSURES EXPRESSION Symbol - a name that represents a value Quosure- An expression that has Quoted Expression - An expression pi a or object stored in R. is_symbol(expr(pi)) b been saved with an environment ? that has been saved by itself. (aka a closure). A quoted expression can be evaluated q when a + b e when a + b Environment - a list-like object that binds A quosure can be evaluated later later to return a result that will depend symbols (names) to objects stored in memory. a evaluated evaluated a 1 a + b, b 3 in the stored environment to a + b ? on the environment it is evaluated in 2 Each env contains a link to a second, parent return a predictable result. b env, which creates a chain, or search path, of environments. is_environment(current_env())

rlang::caller_env(n = 1) Returns rlang::quo(expr) Quote contents as a quosure. Also quos to quote rlang::expr(expr) Quote contents. Also exprs to quote multiple calling env of the function it is in. multiple expressions. a <- 1; b <- 2; q <- quo(a + b); qs <- quos(a, b) expressions. a <- 1; b <- 2; e <- expr(a + b); es <- exprs(a, b, a + b) rlang::child_env(.parent, ...) Creates rlang::enquo(arg) Call from within a function to quote what the user rlang::enexpr(arg) Call from within a function to quote what the user new env as child of .parent. Also env. passed to an argument as a quosure. Also enquos for multiple args. passed to an argument. Also enexprs to quote multiple arguments. rlang::current_env() Returns quote_this < - function(x) enquo(x) quote_that < - function(x) enexpr(x) execution env of the function it is in. quote_these < - function(…) enquos(…) quote_those < - function(…) enexprs(…)

Constant - a bare value (i.e. an atomic rlang::new_quosure(expr, env = caller_env()) Build a rlang::ensym(x) Call from within a function to quote what the user 1 vector of length 1). is_bare_atomic(1) quosure from a quoted expression and an environment. passed to an argument as a symbol, accepts strings. Also ensyms. new_quosure(expr(a + b), current_env()) quote_name < - function(name) ensym(name) quote_names < - function(…) ensyms(…) ( ) Call object - a vector of symbols/constants/calls abs 1 that begins with a function name, possibly followed by arguments. is_call(expr(abs(1)))

pi code Code - a sequence of symbols/constants/calls Parsing and Deparsing Evaluation that will return a result if evaluated. Code can be: To evaluate an expression, R : 3.14 result e 1. Evaluated immediately (Standard Eval) "a + b" "a + b" 1.Looks up the symbols in the expression in parse deparse a 1 2. Quoted to use later (Non-Standard Eval) a + b the active environment (or a supplied one), followed by the environment's parents is_expression(expr(pi)) + fun Parse - Convert a string Deparse - Convert a saved b 2 2.Executes the calls in the expression to a saved expression. expression to a string. e Expression - an object that stores quoted code fun(1, 2) The result of an expression depends on without evaluating it. is_expression(expr(a + b)) a + b a + b which environment it is evaluated in. rlang::parse_expr(x) Convert rlang::expr_text(expr, width = 3 q Quosure- an object that stores both quoted a string to an expression. Also 60L, nlines = Inf) Convert expr a code (without evaluating it) and the code's parse_exprs, sym, parse_quo, to a string. Also quo_name. QUOTED EXPRESSION QUOSURES (and quoted exprs) a + b, b environment. is_quosure(quo(a + b)) parse_quos. e<-parse_expr("a+b") expr_text(e) rlang::eval_bare(expr, env = rlang::eval_tidy(expr, data = NULL, parent.frame()) Evaluate expr in env = caller_env()) Evaluate expr in a b rlang::quo_get_env(quo) Return the environment of a quosure. env. eval_bare(e, env =.GlobalEnv) env, using data as a data mask. Will evaluate quosures in their a b rlang::quo_set_env(quo, expr) stored environment. eval_tidy(q) Set the environment of a quosure. Building Calls a + Data Mask - If data is non-NULL, rlang::quo_get_expr(quo) Return b a + b rlang::call2(.fn, ..., .ns = NULL) Create a call from a function and a list eval_tidy inserts data into the the expression of a quosure. search path before env, matching of args. Use exec to create and then evaluate the call. (See back page a + b for !!!) args <- list(x = 4, base = 2) symbols to names in data. Expression Vector - a list of pieces of quoted call2("log", x = 4, base = 2) code created by base R's expression and parse log (x = 4 , base = 2 ) call2("log", !!!args) Use the pronoun .data$ to force a functions. Not to be confused with expression. a <- 1; b <- 2 symbol to be matched in data, and exec("log", x = 4, base = 2) p <- quo(.data$a + !!b) !! (see back) to force a symbol to 2 mask <- tibble(a = 5, b = 6) exec("log", !!!args) eval_tidy(p, data = mask) be matched in the environments.

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at tidyeval.tidyverse.org • rlang 0.3.0 • Updated: 2018-11 WRITE A Quasiquotation (!!, !!!, :=) Programming Recipes FUNCTION THAT RECOGNIZES Quoting function- A function that quotes any of its arguments internally for delayed evaluation QUOTATION QUASIQUOTATION QUASIQUOTATION in a chosen environment. You must take special steps to program safely with a quoting function. (!!,!!!,:=) Storing an expression Quoting some parts of an without evaluating it. expression while evaluating 1. Capture the quasiquotation-aware e <- expr(a + b) and then inserting the results How to spot a quoting function? dplyr::filter(cars, speed = = 25) A function quotes an argument if the argument with rlang::enquo. of others (unquoting others). speed dist e <- expr(a + b) argument returns an error when run on its own. 1 25 85 2. Evaluate the arg with rlang::eval_tidy. e add1 <- function(x) { log ( e ) log(e) log ( a + b ) log(a + b) Many tidyverse functions are quoting speed == 25 functions: e.g. filter, select, mutate, q <- rlang::enquo(x) 1 fun a + b fun summarise, etc. Error! rlang::eval_tidy(q) + 1 2 } expr(log(e)) expr(log(!!e))

PROGRAM WITH A QUOTING FUNCTION PASS MULTIPLE ARGUMENTS PASS TO ARGUMENT NAMES rlang provides !!, !!!, and := for doing quasiquotation. TO A QUOTING FUNCTION OF A QUOTING FUNCTION

!!, !!!, and := are not functions but syntax (symbols recognized data_mean <- function(data, var) { group_mean <- function(data, var, …) { named_mean <- function(data, var) { by the functions they are passed to). Compare this to how require(dplyr) require(dplyr) require(dplyr) . is used by magrittr::%>%() var <- rlang::enquo(var) 1 var <- rlang::enquo(var) var <- rlang::ensym(var) 1 . is used by stats::lm() data %>% group_vars <- rlang::enquos(…) 1 data %>% .x is used by purrr::map(), and so on. summarise(mean = mean(!!var)) 2 data %>% summarise(!!name := mean(!!var)) 2 2 !!, !!!, and := are only recognized by some rlang functions and } group_by(!!!group_vars) %>% } functions that use those functions (such as tidyverse functions). summarise(mean = mean(!!var)) }

a !! Unquotes the 1. Capture user argument that will 1. Capture user arguments that will 1. Capture user argument that will be quoted with rlang::enquo. be quoted with rlang::enquos. be quoted with rlang::ensym. ( ) symbol or call that log 1 + b log(1 + b) follows. Pronounced fun fun 2 "unquote" or "bang- 2. Unquote the user argument into 2. Unquote splice the user arguments 2. Unquote the name into the !! bang." a <- 1; b <- 2 the quoting function with !!. into the quoting function with !!!. quoting function with !! and :=. expr(log(!!a + b)) expr(log(!!a + b))

a + b Combine !! with () to unquote a longer log ( 3 ) log(3) expression. MODIFY USER ARGUMENTS APPLY AN ARGUMENT TO A DATA FRAME PASS CRAN CHECK fun a <- 1; b <- 2 !! expr(log(!!(a + b))) expr(log(!!(a + b))) my_do <- function(f, v, df) { subset2 <- function(df, rows) { #' @importFrom rlang .data 1 !!! Unquotes a vector f <- rlang::enquo(f) 1 rows <- rlang::enquo(rows) 1 mutate_y <- function(df) { x or list and splices the v <- rlang::enquo(v) vals <- rlang::eval_tidy(rows, data = df) dplyr::mutate(df, y = .data$a +1) 2 log ( 8, b = 2 ) log(8, b=2) results as arguments todo <- rlang::quo((!!f)(!!v)) 2 df[vals, , drop = FALSE] 2 } into the surrounding fun rlang::eval_tidy(todo, df) } !!! call. Pronounced 3 } expr(log(!!!x)) "unquote splice" or "bang-bang-bang." x <- list(8, b = 2) expr(log(!!!x)) 1. Capture user arguments 1. Capture user argument Quoted arguments in tidyverse functions with rlang::enquo. with rlang::enquo. can trigger an R CMD check NOTE about n undefined global variables. To avoid this: := Replaces an = to uno := 1 uno = 1 2. Unquote user arguments into a 2. Evaluate the argument with allow unquoting within new expression or quosure to use rlang::eval_tidy. Pass the data 1. Import rlang::.data to your package, !! 1 the name that appears frame to data to use as a data mask. perhaps with the roxygen2 tag on the left hand side of @importFrom rlang .data tibble::tibble(!!n := 1) 3. Evaluate the new expression/ the =. Use with !! quosure instead of the original 3. Suggest in your documentation n <- expr(uno) argument that your users use the .data 2. Use the .data$ pronoun in front of tibble::tibble(!!n := 1) and .env pronouns. variable names in tidyverse functions

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at tidyeval.tidyverse.org • rlang 0.3.0 • Updated: 2018-11 Preprocessing Performance Metrics caret Package Transformations, filters, and other operations can be applied to To choose how to summarize a model, the trainControl Cheat Sheet the predictors with the preProc option. function is used again. train(, preProc = c("method1", "method2"), ...) trainControl(summaryFunction = , Specifying the Model classProbs = ) Methods include: Custom R functions can be used but caret includes several: defaultSummary (for accuracy, RMSE, etc), twoClassSummary Possible syntaxes for specifying the variables in the model: • "center", "scale", and "range" to normalize predictors. (for ROC curves), and prSummary (for information retrieval). For • , , or to transform "BoxCox" "YeoJohnson" "expoTrans" the last two functions, the option must be set to train(y ~ x1 + x2, data = dat, ...) predictors. classProbs train(x = predictor_df, y = outcome_vector, ...) TRUE. • "knnImpute", "bagImpute", or "medianImpute" to train(recipe_object, data = dat, ...) impute. • rfe, sbf, gafs, and safs only have the x/y interface. • "corr", "nzv", "zv", and "conditionalX" to filter. Grid Search • The train formula method will always create dummy • "pca", "ica", or "spatialSign" to transform groups. variables. To let train determine the values of the tuning parameter(s), the train determines the order of operations; the order that the tuneLength option controls how many values per tuning • The x/y interface to train will not create dummy variables methods are declared does not matter. parameter to evaluate. (but the underlying model function might). Remember to: The recipes package has a more extensive list of preprocessing Alternatively, specific values of the tuning parameters can be operations. declared using the argument: • Have column names in your data. tuneGrid • Use factors for a classification outcome (not 0/1 or integers). grid <- expand.grid(alpha = c(0.1, 0.5, 0.9), • Have valid R names for class levels (not “0"/"1") Adding Options lambda = c(0.001, 0.01))

• Set the random number seed prior to calling train repeatedly train(x = x, y = y, method = "glmnet", Many options can be specified using the to get the same resamples across calls. train trainControl preProc = c("center", "scale"), function: • Use the train option na.action = na.pass if you will tuneGrid = grid) being imputing missing data. Also, use this option when train(y ~ ., data = dat, method = "cubist", predicting new data containing missing values. trControl = trainControl()) Random Search To pass options to the underlying model function, you can pass them to train via the ellipses: Resampling Options For tuning, train can also generate random tuning parameter train(y ~ ., data = dat, method = "rf", combinations over a wide range. tuneLength controls the total # options to `randomForest`: trainControl is used to choose a resampling method: number of combinations to evaluate. To use random search: importance = TRUE) trainControl(method = , ) trainControl(search = "random") Parallel Processing Methods and options are: • "cv" for K-fold cross-validation (number sets the # folds). The foreach package is used to run models in parallel. The Subsampling • "repeatedcv" for repeated cross-validation (repeats for # train code does not change but a “do” package must be called repeats). first. With a large class imbalance, train can subsample the data to • "boot" for bootstrap (number sets the iterations). balance the classes them prior to model fitting. # on MacOS or Linux # on Windows • "LGOCV" for leave-group-out (number and p are options). library(doMC) library(doParallel) • "LOO" for leave-one-out cross-validation. trainControl(sampling = "down") registerDoMC(cores=4) cl <- makeCluster(2) • registerDoParallel(cl) "oob" for out-of-bag resampling (only for some models). Other values are "up", "smote", or "rose". The latter two may • "timeslice" for time-series data (options are require additional package installs. The function parallel::detectCores can help too. initialWindow, horizon, fixedWindow, and skip).

CC BY SA Max Kuhn • [email protected] • https://github.com/topepo/ Learn more at https://topepo.github.io/caret/ • Updated: 9/17 Notes:

RStudio® is a trademark of RStudio, Inc. • CC BY RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at www.rstudio.com RStudio Community rstd.io/community Developer Blog rstd.io/dev-blog R Views Blog rstd.io/rviews-blog Tidyverse Blog rstd.io/tidy-blog Tensorflow Blog rstd.io/tf-blog Twitter rstd.io/twitter GitHub rstd.io/github LinkedIn rstd.io/linkedin YouTube rstd.io/youtube Facebook rstd.io/facebook

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com