R Syntax Comparison : : CHEAT SHEET
Total Page:16
File Type:pdf, Size:1020Kb
R Syntax Comparison : : CHEAT SHEET Dollar sign syntax Formula syntax Tidyverse syntax goal(data$x, data$y) goal(y~x|z, data=data, group=w) data %>% goal(x) SUMMARY STATISTICS: SUMMARY STATISTICS: SUMMARY STATISTICS: one continuous variable: one continuous variable: one continuous variable: mean(mtcars$mpg) mosaic::mean(~mpg, data=mtcars) mtcars %>% dplyr::summarize(mean(mpg)) one categorical variable: one categorical variable: one categorical variable: table(mtcars$cyl) mosaic::tally(~cyl, data=mtcars) mtcars %>% dplyr::group_by(cyl) %>% dplyr::summarize(n()) the pipe two categorical variables: two categorical variables: table(mtcars$cyl, mtcars$am) mosaic::tally(cyl~am, data=mtcars) two categorical variables: mtcars %>% dplyr::group_by(cyl, am) %>% one continuous, one categorical: one continuous, one categorical: dplyr::summarize(n()) mean(mtcars$mpg[mtcars$cyl==4]) mosaic::mean(mpg~cyl, data=mtcars) mean(mtcars$mpg[mtcars$cyl==6]) one continuous, one categorical: mtcars %>% dplyr::group_by(cyl) %>% mean(mtcars$mpg[mtcars$cyl==8]) tilde dplyr::summarize(mean(mpg)) PLOTTING: PLOTTING: PLOTTING: one continuous variable: one continuous variable: one continuous variable: hist(mtcars$disp) lattice::histogram(~disp, data=mtcars) ggplot2::qplot(x=mpg, data=mtcars, geom = "histogram") boxplot(mtcars$disp) lattice::bwplot(~disp, data=mtcars) ggplot2::qplot(y=disp, x=1, data=mtcars, geom="boxplot") one categorical variable: one categorical variable: one categorical variable: barplot(table(mtcars$cyl)) mosaic::bargraph(~cyl, data=mtcars) ggplot2::qplot(x=cyl, data=mtcars, geom="bar") two continuous variables: two continuous variables: two continuous variables: plot(mtcars$disp, mtcars$mpg) lattice::xyplot(mpg~disp, data=mtcars) ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point") two categorical variables: two categorical variables: two categorical variables: mosaicplot(table(mtcars$am, mtcars$cyl)) mosaic::bargraph(~am, data=mtcars, group=cyl) ggplot2::qplot(x=factor(cyl), data=mtcars, geom="bar") + facet_grid(.~am) one continuous, one categorical: one continuous, one categorical: histogram(mtcars$disp[mtcars$cyl==4]) lattice::histogram(~disp|cyl, data=mtcars) one continuous, one categorical: histogram(mtcars$disp[mtcars$cyl==6]) ggplot2::qplot(x=disp, data=mtcars, geom = "histogram") + histogram(mtcars$disp[mtcars$cyl==8]) lattice::bwplot(cyl~disp, data=mtcars) facet_grid(.~cyl) boxplot(mtcars$disp[mtcars$cyl==4]) ggplot2::qplot(y=disp, x=factor(cyl), data=mtcars, boxplot(mtcars$disp[mtcars$cyl==6]) geom="boxplot") boxplot(mtcars$disp[mtcars$cyl==8]) The variety of R syntaxes give WRANGLING: you many ways to “say” the WRANGLING: subsetting: subsetting: mtcars[mtcars$mpg>30, ] same thing mtcars %>% dplyr::filter(mpg>30) making a new variable: making a new variable: read across the cheatsheet to see how different mtcars$efficient[mtcars$mpg>30] <- TRUE mtcars <- mtcars %>% syntaxes approach the same problem mtcars$efficient[mtcars$mpg<30] <- FALSE dplyr::mutate(efficient = if_else(mpg>30, TRUE, FALSE)) RStudio® is a trademark of RStudio, Inc. • CC BY Amelia McNamara • [email protected] • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01 R Syntax Comparison : : CHEAT SHEET Syntax is the set of rules that govern what code works and Even more ways to say the same thing doesn’t work in a programming language. Most programming Even within one syntax, there are ofen variations that are equally valid. As a case study, let’s look at the ggplot2 languages offer one standardized syntax, but R allows package ggplot2 developers to specify their own syntax. As a result, there is a large syntax. is the plotting package that lives within the tidyverse. If you read down this column, all the code variety of (equally valid) R syntaxes. here produces the same graphic. The three most prevalent R syntaxes are: quickplot 1. The sometimes called that look di syntax dollar sign syntax, base R down read qplot() ggplot2 syntax, expected by most base R functions. It is stands for quickplot, and allows you to make quick plots. It doesn’t have the full power of , characterized by the use of dataset$variablename, and and it uses a slightly different syntax than the rest of the package. is also associated with square bracket subsetting, as in dataset[1,2]. Almost all R functions will accept things ggplot2::qplot(x=disp, y=mpg, data=mtcars, geom="point") in one code of many pieces for this column passed to them in dollar sign syntax. 2. The , used by modeling functions like formula syntax ! lm(), lattice graphics, and mosaic summary statistics. It ggplot2::qplot(x=disp, y=mpg, data=mtcars) ff uses the tilde (~) to connect a response variable and one (or the same graphic but produce erent many) predictors. Many base R functions will accept formula syntax. ggplot2::qplot(disp, mpg, data=mtcars) ! ! 3. The tidyverse syntax used by dplyr, tidyr, and more. These functions expect data to be the first argument, which allows them to work with the “pipe” (%>%) from the magrittr package. Typically, ggplot2 is thought of as part ggplot of the tidyverse, although it has its own flavor of the syntax using plus signs (+) to string pieces together. ggplot2 author To unlock the power of ggplot2, you need to use the ggplot() function (which sets up a plotting region) and Hadley Wickham has said the package would have had add geoms to the plot. different syntax if he had written it afer learning about the pipe. ggplot2::ggplot(mtcars) + Educators ofen try to teach within one unified syntax, but most R geom_point(aes(x=disp, y=mpg)) programmers use some combination of all the syntaxes. ggplot2::ggplot(data=mtcars) + plus adds geom_point(mapping=aes(x=disp, y=mpg)) layers Internet research tip: ggplot2::ggplot(mtcars, aes(x=disp, y=mpg)) + geom_point() If you are searching on google, StackOverflow, or another favorite online source and see code in a syntax you don’t recognize: ggplot2::ggplot(mtcars, aes(x=disp)) + • Check to see if the code is using one of the three geom_point(aes(y=mpg)) common syntaxes listed on this cheatsheet • Try your search again, using a keyword from the ggformula syntax name (“tidyverse”) or a relevant package (“mosaic”) The “third and a half way” to use the formula syntax, but get ggplot2-style graphics ggformula::gf_point(mpg~disp, data= mtcars) ! Sometimes particular syntaxes work, but are considered formulas in base plots dangerous to use, because they are so easy to get wrong. For Base R plots will also take the formula syntax, although it's not as commonly used example, passing variable names without assigning them to a named argument. plot(mpg~disp, data=mtcars) RStudio® is a trademark of RStudio, Inc. • CC BY Amelia McNamara • [email protected] • @AmeliaMN • science.smith.edu/~amcnamara/ • Updated: 2018-01.