<<

Data Visualization with : : CHEAT SHEET

Use a geom function to represent data points, use the geom’ aesthetic properties to represent variables. Geoms Each function returns a layer. GRAPHICAL PRIMITIVES TWO VARIABLES ggplot2 is based on the grammar of , the idea that you can build every graph from the same a <- ggplot(economics, aes(date, unemploy)) continuous x , continuous y continuous bivariate distribution components: a data set, a coordinate system, <- ggplot(seals, aes(x = long, y = lat)) e <- ggplot(mpg, aes(cty, hwy)) h <- ggplot(diamonds, aes(carat, price)) and geoms—visual marks that represent data points. a + geom_blank() e + geom_label(aes(label = cty), nudge_x = 1, h + geom_bin2d(binwidth = (0.25, 500)) (Useful for expanding limits) nudge_y = 1, check_overlap = TRUE) x, y, label, x, y, alpha, color, fill, linetype, size, weight alpha, angle, color, family, fontface, hjust, F M A b + geom_curve(aes(yend = lat + 1, lineheight, size, vjust h + geom_density2d() xend=long+1,curvature=z)) - x, xend, y, yend, x, y, alpha, colour, group, linetype, size + = alpha, angle, color, curvature, linetype, size e + geom_jitter(height = 2, width = 2) x, y, alpha, color, fill, shape, size data geom coordinate plot a + geom_path(lineend="butt", linejoin="round", h + geom_hex() x = F · y = A system linemitre=1) e + geom_point(), x, y, alpha, color, fill, shape, x, y, alpha, colour, fill, size x, y, alpha, color, group, linetype, size size, stroke

To display values, map variables in the data to visual a + geom_polygon(aes(group = group)) e + geom_quantile(), x, y, alpha, color, group, properties of the geom (aesthetics) like size, color, and x x, y, alpha, color, fill, group, linetype, size linetype, size, weight continuous function and y locations. i <- ggplot(economics, aes(date, unemploy)) b + geom_rect(aes(xmin = long, ymin=lat, xmax= F M A long + 1, ymax = lat + 1)) - xmax, xmin, ymax, e + geom_rug(sides = "bl"), x, y, alpha, color, i + geom_area() ymin, alpha, color, fill, linetype, size linetype, size x, y, alpha, color, fill, linetype, size + = a + geom_ribbon(aes(ymin=unemploy - 900, e + geom_smooth(method = lm), x, y, alpha, i + geom_line() ymax=unemploy + 900)) - x, ymax, ymin, data geom coordinate plot color, fill, group, linetype, size, weight x, y, alpha, color, group, linetype, size x = F · y = A system alpha, color, fill, group, linetype, size color = F e + geom_text(aes(label = cty), nudge_x = 1, i + geom_step(direction = "hv") size = A nudge_y = 1, check_overlap = TRUE), x, y, label, x, y, alpha, color, group, linetype, size alpha, angle, color, family, fontface, hjust, LINE SEGMENTS lineheight, size, vjust common aesthetics: x, y, alpha, color, linetype, size b + geom_abline(aes(intercept=0, slope=1)) visualizing error Complete the template below to build a graph. b + geom_hline(aes(yintercept = lat)) df <- data.frame(grp = c("A", "B"), fit = 4:5, se = 1:2) required b + geom_vline(aes(xintercept = long)) discrete x , continuous y j <- ggplot(df, aes(grp, fit, ymin = fit-se, ymax = fit+se)) ggplot (data = ) + f <- ggplot(mpg, aes(class, hwy))

(mapping = aes( ), b + geom_segment(aes(yend=lat+1, xend=long+1)) j + geom_crossbar(fatten = 2) b + geom_spoke(aes(angle = 1:1155, radius = 1)) f + geom_col(), x, y, alpha, color, fill, group, x, y, ymax, ymin, alpha, color, fill, group, linetype, stat = , position = ) + Not linetype, size size required, + sensible j + geom_errorbar(), x, ymax, ymin, alpha, color, f + geom_boxplot(), x, y, lower, middle, upper, group, linetype, size, width (also + defaults ymax, ymin, alpha, color, fill, group, linetype, supplied ONE VARIABLE continuous geom_errorbarh()) + c <- ggplot(mpg, aes(hwy)); c2 <- ggplot(mpg) shape, size, weight j + geom_linerange() f + geom_dotplot(binaxis = "y", stackdir = x, ymin, ymax, alpha, color, group, linetype, size c + geom_area(stat = "bin") "center"), x, y, alpha, color, fill, group x, y, alpha, color, fill, linetype, size j + geom_pointrange() ggplot(data = mpg, aes(x = cty, y = hwy)) Begins a plot f + geom_violin(scale = "area"), x, y, alpha, color, x, y, ymin, ymax, alpha, color, fill, group, linetype, that you finish by adding layers to. Add one geom c + geom_density(kernel = "gaussian") fill, group, linetype, size, weight shape, size function per layer. x, y, alpha, color, fill, group, linetype, size, weight aesthetic mappings data geom c + geom_dotplot() maps qplot(x = cty, y = hwy, data = mpg, geom = “point") x, y, alpha, color, fill data <- data.frame(murder = USArrests$Murder, Creates a complete plot with given data, geom, and discrete x , discrete y state = tolower(rownames(USArrests))) mappings. Supplies many useful defaults. c + geom_freqpoly() x, y, alpha, color, group, g <- ggplot(diamonds, aes(cut, color)) map <- map_data("state") linetype, size k <- ggplot(data, aes(fill = murder)) last_plot() Returns the last plot c + geom_histogram(binwidth = 5) x, y, alpha, g + geom_count(), x, y, alpha, color, fill, shape, k + geom_map(aes(map_id = state), map = map) ggsave("plot.png", width = 5, height = 5) Saves last plot color, fill, linetype, size, weight size, stroke + expand_limits(x = map$long, y = map$lat), as 5’ x 5’ file named "plot.png" in working directory. map_id, alpha, color, fill, linetype, size Matches file type to file extension. c2 + geom_qq(aes(sample = hwy)) x, y, alpha, color, fill, linetype, size, weight THREE VARIABLES seals$z <- with(seals, sqrt(delta_long^2 + delta_lat^2))l <- ggplot(seals, aes(long, lat)) discrete l + geom_contour(aes(z = z)) l + geom_raster(aes(fill = z), hjust=0.5, vjust=0.5, d <- ggplot(mpg, aes(fl)) x, y, z, alpha, colour, group, linetype, interpolate=FALSE) size, weight x, y, alpha, fill d + geom_bar() x, alpha, color, fill, linetype, size, weight l + geom_tile(aes(fill = z)), x, y, alpha, color, fill, linetype, size, width

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • info@.com • 844-448-1212 • rstudio.com • Learn more at http://ggplot2.tidyverse.org • ggplot2 3.1.0 • Updated: 2018-12 Stats An alternative way to build a layer Scales Coordinate Systems Faceting A stat builds new variables to plot (e.g., count, prop). Scales map data values to the visual values of an <- d + geom_bar() Facets divide a plot into fl cty cyl aesthetic. To change a mapping, add a new scale. r + coord_cartesian(xlim = c(0, 5)) subplots based on the

x ..count.. xlim, ylim values of one or more (n <- d + geom_bar(aes(fill = fl))) The default cartesian coordinate system discrete variables. + = aesthetic prepackaged scale-specific r + coord_fixed(ratio = 1/2) scale_ to adjust scale to use arguments ratio, xlim, ylim data stat geom coordinate plot Cartesian coordinates with fixed aspect ratio t <- ggplot(mpg, aes(cty, hwy)) + geom_point() x = x · system n + scale_fill_manual( between x and y units y = ..count.. values = c("skyblue", "royalblue", "blue", “navy"), r + coord_flip() t + facet_grid(cols = vars(fl)) Visualize a stat by changing the default stat of a geom limits = c("d", "e", "p", "r"), breaks =c("d", "e", "p", “r"), xlim, ylim facet into columns based on fl function, geom_bar(stat="count") or by using a stat name = "fuel", labels = c("D", "E", "P", "R")) Flipped Cartesian coordinates t + facet_grid(rows = vars(year)) r + coord_polar(theta = "x", direction=1 ) facet into rows based on year function, stat_count(geom="bar"), which calls a default range of title to use in labels to use breaks to use in theta, start, direction values to include legend/axis in legend/axis legend/axis geom to make a layer (equivalent to a geom function). in mapping Polar coordinates t + facet_grid(rows = vars(year), cols = vars(fl)) Use ..name.. syntax to map stat variables to aesthetics. r + coord_trans(ytrans = “sqrt") facet into both rows and columns xtrans, ytrans, limx, limy GENERAL PURPOSE SCALES Transformed cartesian coordinates. Set xtrans and t + facet_wrap(vars(fl)) geom to use stat function geommappings wrap facets into a rectangular layout Use with most aesthetics ytrans to the name of a window function. i + stat_density2d(aes(fill = ..level..), Set scales to let axis limits vary across facets geom = "polygon") scale_*_continuous() - map cont’ values to visual ones 60 π + coord_quickmap() t + facet_grid(rows = vars(drv), cols = vars(fl), variable created by stat scale_*_discrete() - map discrete values to visual ones lat π + coord_map(projection = "ortho", orientation=c(41, -74, 0))projection, orienztation, scales = "free") scale_*_identity() - use data values as visual ones xlim, ylim long x and y axis limits adjust to individual facets c + stat_bin(binwidth = 1, origin = 10) scale_*_manual(values = c()) - map discrete values to Map projections from the mapproj package x, y | ..count.., ..ncount.., ..density.., ..ndensity.. manually chosen visual ones (mercator (default), azequalarea, lagrange, etc.) "free_x" - x axis limits adjust scale_*_date(date_labels = "%m/%d"), date_breaks = "2 "free_y" - y axis limits adjust c + stat_count(width = 1) x, y, | ..count.., ..prop.. weeks") - treat data values as dates. Set labeller to adjust facet labels c + stat_density(adjust = 1, kernel = “gaussian") scale_*_datetime() - treat data x values as date times. x, y, | ..count.., ..density.., ..scaled.. Use same arguments as scale_x_date(). See ?strptime for t + facet_grid(cols = vars(fl), labeller = label_both) label formats. Position Adjustments e + stat_bin_2d(bins = 30, drop = T) fl: c fl: d fl: e fl: p fl: r x, y, fill | ..count.., ..density.. Position adjustments determine how to arrange geoms t + facet_grid(rows = vars(fl), X & Y LOCATION SCALES e + stat_bin_hex(bins=30) x, y, fill | ..count.., ..density.. that would otherwise occupy the same space. labeller = label_bquote(alpha ^ .(fl))) c d e p r e + stat_density_2d(contour = TRUE, n = 100) Use with x or y aesthetics (x shown here) ↵ ↵ ↵ ↵ ↵ x, y, color, size | ..level.. scale_x_log10() - Plot x on log10 scale s <- ggplot(mpg, aes(fl, fill = drv))

e + stat_ellipse(level = 0.95, segments = 51, type = "t") scale_x_reverse() - Reverse direction of x axis s + geom_bar(position = "dodge") scale_x_sqrt() - Plot x on square root scale Arrange elements side by side l + stat_contour(aes(z = z)) x, y, z, order | ..level.. s + geom_bar(position = "fill") Labels Stack elements on top of one another, COLOR AND FILL SCALES (DISCRETE) normalize height t + labs( x = "New x axis label", y = "New y axis label", l + stat_summary_hex(aes(z = z), bins = 30, fun = max) title ="Add a title above the plot", x, y, z, fill | ..value.. Use scale functions n <- d + geom_bar(aes(fill = fl)) e + geom_point(position = "jitter") Add random noise to X and Y position of each subtitle = "Add a subtitle below title", to update legend l + stat_summary_2d(aes(z = z), bins = 30, fun = mean) caption = "Add a caption below plot", x, y, z, fill | ..value.. n + scale_fill_brewer(palette = "Blues") element to avoid overplotting labels For palette choices: A = "New legend title") RColorBrewer::display.brewer.all() e + geom_label(position = "nudge") f + stat_boxplot(coef = 1.5) x, y | ..lower.., B Nudge labels away from points t + annotate(geom = "text", x = 8, y = 9, label = "A") ..middle.., ..upper.., ..width.. , ..ymin.., ..ymax.. n + scale_fill_grey(start = 0.2, end = 0.8, na.value = "red") geom to place manual values for geom’s aesthetics f + stat_ydensity(kernel = "gaussian", scale = “area") x, y | s + geom_bar(position = "stack") ..density.., ..scaled.., ..count.., ..n.., ..violinwidth.., ..width.. Stack elements on top of one another COLOR AND FILL SCALES (CONTINUOUS) e + stat_ecdf(n = 40) x, y | ..x.., ..y.. o <- c + geom_dotplot(aes(fill = ..x..)) Each position adjustment can be recast as a function with e + stat_quantile( = c(0.1, 0.9), formula = y ~ manual width and height arguments Legends log(x), method = "rq") x, y | .... o + scale_fill_distiller(palette = "Blues") s + geom_bar(position = position_dodge(width = 1)) n + theme(legend.position = "bottom") e + stat_smooth(method = "lm", formula = y ~ x, se=T, Place legend at "bottom", "top", "lef", or "right" level=0.95) x, y | ..se.., ..x.., ..y.., ..ymin.., ..ymax.. o + scale_fill_gradient(low="red", high="yellow") n + guides(fill = "none") Set legend type for each aesthetic: colorbar, legend, or ggplot() + stat_function(aes(x = -3:3), n = 99, fun = o + scale_fill_gradient2(low="red", high=“blue", none (no legend) dnorm, args = list(sd=0.5)) x | ..x.., ..y.. mid = "white", midpoint = 25) Themes n + scale_fill_discrete(name = "Title", labels = c("A", "B", "C", "D", "E")) e + stat_identity(na.rm = TRUE) r + theme_bw() r + theme_classic() Set legend title and labels with a scale function. o + scale_fill_gradientn(colours=topo.colors(6)) White background ggplot() + stat_qq(aes(sample=1:100), dist = qt, Also: rainbow(), heat.colors(), terrain.colors(), with grid lines r + theme_light() dparam=list(df=5)) sample, x, y | ..sample.., ..theoretical.. cm.colors(), RColorBrewer::brewer.pal() r + theme_gray() r + theme_linedraw() e + stat_sum() x, y, size | ..n.., ..prop.. Grey background r + theme_minimal() Zooming e + stat_summary(fun.data = "mean_cl_boot") SHAPE AND SIZE SCALES (default theme) Minimal themes

h + stat_summary_bin(fun.y = "mean", geom = "bar") p <- e + geom_point(aes(shape = fl, size = cyl)) r + theme_dark() r + theme_void() Without clipping (preferred) dark for contrast Empty theme e + stat_unique() p + scale_shape() + scale_size() t + coord_cartesian( p + scale_shape_manual(values = c(3:7)) xlim = c(0, 100), ylim = c(10, 20)) With clipping (removes unseen data points) t + xlim(0, 100) + ylim(10, 20) p + scale_radius(range = c(1,6)) t + scale_x_continuous(limits = c(0, 100)) + p + scale_size_area(max_size = 6) scale_y_continuous(limits = c(0, 100))

RStudio® is a trademark of RStudio, Inc. • CC BY SA RStudio • [email protected] • 844-448-1212 • rstudio.com • Learn more at http://ggplot2.tidyverse.org • ggplot2 3.1.0 • Updated: 2018-12