R Syntax Comparison : : CHEAT SHEET Dollar sign syntax Formula syntax Tidyverse syntax goal(data$x, data$y) goal(y~x|z, data=data, group=w) data %>% goal(x) SUMMARY STATISTICS: SUMMARY STATISTICS: SUMMARY STATISTICS:

one continuous variable: one continuous variable: one continuous variable: mean(mtcars$mpg) mosaic::mean(~mpg, data=mtcars) mtcars %>% dplyr::summarize(mean(mpg))

one categorical variable: one categorical variable: one categorical variable: table(mtcars$cyl) mosaic::tally(~cyl, data=mtcars) mtcars %>% dplyr::group_by(cyl) %>% dplyr::summarize(n()) the pipe two categorical variables: two categorical variables: table(mtcars$cyl, mtcars$am) mosaic::tally(cyl~am, data=mtcars) two categorical variables: mtcars %>% dplyr::group_by(cyl, am) %>% one continuous, one categorical: one continuous, one categorical: dplyr::summarize(n()) mean(mtcars$mpg[mtcars$cyl==4]) mosaic::mean(mpg~cyl, data=mtcars) mean(mtcars$mpg[mtcars$cyl==6]) one continuous, one categorical: mtcars %>% dplyr::group_by(cyl) %>% mean(mtcars$mpg[mtcars$cyl==8]) tilde dplyr::summarize(mean(mpg))

PLOTTING: PLOTTING: PLOTTING: one continuous variable: one continuous variable: one continuous variable: hist(mtcars$disp) lattice::histogram(~disp, data=mtcars) ::qplot(x=mpg, data=mtcars, geom = "histogram")

boxplot(mtcars$disp) lattice::bwplot(~disp, data=mtcars) ggplot2::qplot(y=disp, x=1, data=mtcars, geom="boxplot")

one categorical variable: one categorical variable: one categorical variable: barplot(table(mtcars$cyl)) mosaic::bargraph(~cyl, data=mtcars) ggplot2::qplot(x=cyl, data=mtcars, geom="bar")

two continuous variables: two continuous variables: two continuous variables: plot(mtcars$disp, mtcars$mpg) lattice::xyplot(mpg~disp, data=mtcars) ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point")

two categorical variables: two categorical variables: two categorical variables: mosaicplot(table(mtcars$am, mtcars$cyl)) mosaic::bargraph(~am, data=mtcars, group=cyl) ggplot2::qplot(x=factor(cyl), data=mtcars, geom="bar") + facet_grid(.~am) one continuous, one categorical: one continuous, one categorical: histogram(mtcars$disp[mtcars$cyl==4]) lattice::histogram(~disp|cyl, data=mtcars) one continuous, one categorical: histogram(mtcars$disp[mtcars$cyl==6]) ggplot2::qplot(x=disp, data=mtcars, geom = "histogram") + histogram(mtcars$disp[mtcars$cyl==8]) lattice::bwplot(cyl~disp, data=mtcars) facet_grid(.~cyl)

boxplot(mtcars$disp[mtcars$cyl==4]) ggplot2::qplot(y=disp, x=factor(cyl), data=mtcars, boxplot(mtcars$disp[mtcars$cyl==6]) geom="boxplot") boxplot(mtcars$disp[mtcars$cyl==8]) The variety of syntaxes give

WRANGLING: you many ways to “say” the WRANGLING: subsetting: subsetting: mtcars[mtcars$mpg>30, ] same thing mtcars %>% dplyr::filter(mpg>30)

making a new variable: making a new variable: read across the cheatsheet to see how different mtcars$efficient[mtcars$mpg>30] <- TRUE mtcars <- mtcars %>% syntaxes approach the same problem mtcars$efficient[mtcars$mpg<30] <- FALSE dplyr::mutate(efficient = if_else(mpg>30, TRUE, FALSE))

RStudio® is a trademark of RStudio, Inc. • CC BY Amelia McNamara • [email protected] • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01 R Syntax Comparison : : CHEAT SHEET

Syntax is the set of rules that govern what code works and Even more ways to say the same thing doesn’t work in a programming language. Most programming Even within one syntax, there are ofen variations that are equally valid. As a case study, let’s look at the ggplot2 languages offer one standardized syntax, but R allows package ggplot2 developers to specify their own syntax. As a result, there is a large syntax. is the plotting package that lives within the tidyverse. If you read down this column, all the code variety of (equally valid) R syntaxes. here produces the same graphic. The three most prevalent R syntaxes are: quickplot 1. The sometimes called syntax that look di ff erent but produce the same graphic dollar sign syntax, base R read down qplot() ggplot2 syntax, expected by most base R functions. It is stands for quickplot, and allows you to make quick plots. It doesn’t have the full power of , characterized by the use of dataset$variablename, and and it uses a slightly different syntax than the rest of the package. is also associated with square bracket subsetting, as in dataset[1,2]. Almost all R functions will accept things

ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point") this column for many pieces of code in one passed to them in dollar sign syntax. 2. The formula syntax, used by modeling functions like lm(), lattice graphics, and mosaic summary statistics. It ggplot2::qplot(x=disp, y=mpg, data=mtcars) ! uses the tilde (~) to connect a response variable and one (or many) predictors. Many base R functions will accept formula syntax. ggplot2::qplot(disp, mpg, data=mtcars) ! ! 3. The tidyverse syntax used by dplyr, tidyr, and more. These functions expect data to be the first argument, which allows them to work with the “pipe” (%>%) from the magrittr package. Typically, ggplot2 is thought of as part ggplot of the tidyverse, although it has its own flavor of the syntax using plus signs (+) to string pieces together. ggplot2 author To unlock the power of ggplot2, you need to use the ggplot() function (which sets up a plotting region) and has said the package would have had add geoms to the plot. different syntax if he had written it afer learning about the pipe. ggplot2::ggplot(mtcars) + Educators ofen try to teach within one unified syntax, but most R geom_point(aes(x=disp, y=mpg)) programmers use some combination of all the syntaxes. ggplot2::ggplot(data=mtcars) + plus adds geom_point(mapping=aes(x=disp, y=mpg)) layers

Internet research tip: ggplot2::ggplot(mtcars, aes(x=disp, y=mpg)) + geom_point() If you are searching on google, StackOverflow, or another favorite online source and see code in a syntax you don’t recognize: ggplot2::ggplot(mtcars, aes(x=disp)) + • Check to see if the code is using one of the three geom_point(aes(y=mpg)) common syntaxes listed on this cheatsheet • Try your search again, using a keyword from the ggformula syntax name (“tidyverse”) or a relevant package (“mosaic”) The “third and a half way” to use the formula syntax, but get ggplot2-style graphics

ggformula::gf_point(mpg~disp, data= mtcars)

! Sometimes particular syntaxes work, but are considered formulas in base plots dangerous to use, because they are so easy to get wrong. For Base R plots will also take the formula syntax, although it's not as commonly used example, passing variable names without assigning them to a named argument. plot(mpg~disp, data=mtcars)

RStudio® is a trademark of RStudio, Inc. • CC BY Amelia McNamara • [email protected] • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01