<<

Customizing charts Andrew Ba Tran

Contents

Reordering chart labels ...... 2 Lollipop plot ...... 4 Saving ggplots ...... 9 Scales ...... 12 Scales for color and fill ...... 14 Annotations ...... 16 Themes ...... 17 Your turn ...... 19 This is from the fourth chapter of learn.r-journalism.com. Let’s bring that data back in again. library(readr)

ages <- read_csv("data/ages.csv")

Remember that Dot Plot we made before? library(ggplot2)

ggplot(ages, aes(x=actress_age, y=Movie)) + geom_point()

1 Working Girl Witness What's Eating Gilbert Grape What Lies Beneath Vanilla Sky Up in the Air Transcendence Training Day The Tourist The Rum Diary The Preacher's Wife The Money Pit The Hoax The Good German The American Splash Sommersby Solaris Sleepy Hollow Sleepless in Seattle Six Days Seven Nights Shall We Dance Sabrina Rock of Ages Risky Business Remember the Titans Regarding Henry Random Hearts Raiders of the Lost Ark Pretty Woman Pirates of the Caribbean Out of Time Out of Sight One Fine Day Ocean's Eleven Oblivion O Brother, Where Art Thou?

Movie Nights in Rodanthe Mo' Better Blues Mission Impossible III Malcolm X Losin It Leatherheads John Q He Got Game Flight First Knight Firewall Extremely Loud & Incredibly Close Empire Strikes Back Edward Scissorhands DejaVu Dark Shadows Cloud Atlas Chocolat Charlie Wilson's War Captain Phillips Blow Blade Runner Bee Season & Autumn in New York Arbitrage Apollo 13 An Officer and a Gentleman American Ganster Amelia 20 30 40 50 actress_age

It’s not that great, right? It’s in reverse alphabetical order. Let’s reorder it based on age.

Reordering chart labels

This means we need to transform the data. The easiest way to do this is with the package forcats, which (surprise!) is also part of the tidyverse universe. The function is fct_reorder() and it works like this

2 # If you don't have forcats installed yet, uncomment the line below and run # install.packages("forcats") library(forcats) ggplot(ages, aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) + geom_point()

3 Shall We Dance Captain Phillips Extremely Loud & Incredibly Close Cloud Atlas Firewall Nights in Rodanthe What Lies Beneath O Brother, Where Art Thou? Charlie Wilson's War Bee Season Apollo 13 Random Hearts One Fine Day Leatherheads The Good German Knight and Day Cast Away Up in the Air The Money Pit The Hoax Chocolat The Tourist Flight Amelia The American Rock of Ages Ocean's Eleven Malcolm X John Q Intolerable Cruelty Arbitrage The Preacher's Wife Regarding Henry Oblivion Losin It Batman & Robin Solaris Out of Time Working Girl Sleepless in Seattle Transcendence Sommersby Sabrina Mission Impossible III First Knight DejaVu Six Days Seven Nights Remember the Titans Raiders of the Lost Ark American Ganster Top Gun Out of Sight fct_reorder(Movie, actress_age, desc = TRUE) actress_age, fct_reorder(Movie, Mo' Better Blues Jerry Maguire Forrest Gump Autumn in New York Witness Vanilla Sky Training Day An Officer and a Gentleman Blow The Rum Diary Far and Away Dark Shadows Splash Risky Business Empire Strikes Back Pretty Woman He Got Game Blade Runner Pirates of the Caribbean What's Eating Gilbert Grape Sleepy Hollow Edward Scissorhands 20 30 40 50 actress_age

Not a bad looking chart. We can tweak it a little more and turn it into

Lollipop plot

This time we’re going to use a new geom_: geom_segment() ggplot(ages, aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) + geom_segment( aes(x =0, xend = actress_age, yend = fct_reorder(Movie, actress_age, desc=TRUE)), color = "gray50") + geom_point()

4 Shall We Dance Captain Phillips Extremely Loud & Incredibly Close Cloud Atlas Firewall Nights in Rodanthe What Lies Beneath O Brother, Where Art Thou? Charlie Wilson's War Bee Season Apollo 13 Random Hearts One Fine Day Leatherheads The Good German Knight and Day Cast Away Up in the Air The Money Pit The Hoax Chocolat The Tourist Flight Amelia The American Rock of Ages Ocean's Eleven Malcolm X John Q Intolerable Cruelty Arbitrage The Preacher's Wife Regarding Henry Oblivion Losin It Batman & Robin Solaris Out of Time Working Girl Sleepless in Seattle Transcendence Sommersby Sabrina Mission Impossible III First Knight DejaVu Six Days Seven Nights Remember the Titans Raiders of the Lost Ark American Ganster Top Gun Out of Sight fct_reorder(Movie, actress_age, desc = TRUE) actress_age, fct_reorder(Movie, Mo' Better Blues Jerry Maguire Forrest Gump Autumn in New York Witness Vanilla Sky Training Day An Officer and a Gentleman Blow The Rum Diary Far and Away Dark Shadows Splash Risky Business Empire Strikes Back Pretty Woman He Got Game Blade Runner Pirates of the Caribbean What's Eating Gilbert Grape Sleepy Hollow Edward Scissorhands 0 20 40 60 actress_age

Looking interesting, right? If we wanted to publish this on a website or share on social media, we’ll need to clean up the labels and add a title and add a source line. That’s easy to do. ggplot(ages, aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) + geom_segment( aes(x =0, y=fct_reorder(Movie, actress_age, desc=TRUE), xend = actress_age, yend = fct_reorder(Movie, actress_age, desc=TRUE)), color = "gray50") + geom_point() +

5 # NEW CODE BELOW labs(x="Actress age", y="Movie", title = "Actress ages in movies", subtitle = "for R for Journalists class", caption = "Data from Vulture.com and IMDB") + theme_minimal() Actress ages in movies for R for Journalists class Shall We Dance Captain Phillips Extremely Loud & Incredibly Close Cloud Atlas Firewall Nights in Rodanthe What Lies Beneath O Brother, Where Art Thou? Charlie Wilson's War Bee Season Apollo 13 Random Hearts One Fine Day Leatherheads The Good German Knight and Day Cast Away Up in the Air The Money Pit The Hoax Chocolat The Tourist Flight Amelia The American Rock of Ages Ocean's Eleven Malcolm X John Q Intolerable Cruelty Arbitrage The Preacher's Wife Regarding Henry Oblivion Losin It Batman & Robin Solaris Out of Time

Movie Working Girl Sleepless in Seattle Transcendence Sommersby Sabrina Mission Impossible III First Knight DejaVu Six Days Seven Nights Remember the Titans Raiders of the Lost Ark American Ganster Top Gun Out of Sight Mo' Better Blues Jerry Maguire Forrest Gump Autumn in New York Witness Vanilla Sky Training Day An Officer and a Gentleman Blow The Rum Diary Far and Away Dark Shadows Splash Risky Business Empire Strikes Back Pretty Woman He Got Game Blade Runner Pirates of the Caribbean What's Eating Gilbert Grape Sleepy Hollow Edward Scissorhands 0 20 40 60 Actress age Data from Vulture.com and IMDB

So we added a lot of information to the labs() function: x, y, title, subtitle, and caption. We also added theme_minimal() which changed a lot of the style, such as the gray grid background. What if we wanted to clean it up even more? It’s such a tall chart, it’s difficult to keep track of the actual age represented by the lollipop. Let’s get rid of the grids and add the numbers to the right of each dot.

6 ggplot(ages, aes(x=actress_age, y=fct_reorder(Movie, actress_age, desc=TRUE))) + geom_segment( aes(x =0, y=fct_reorder(Movie, actress_age, desc=TRUE), xend = actress_age, yend = fct_reorder(Movie, actress_age, desc=TRUE)), color = "gray50") + geom_point() + labs(x="Actress age", y="Movie", title = "Actress ages in movies", subtitle = "for R for Journalists class", caption = "Data from Vulture.com and IMDB") + theme_minimal() + # NEW CODE BELOW geom_text(aes(label=actress_age), hjust=-.5) + theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), axis.line = element_blank(), axis.text.x = element_blank())

7 Actress ages in movies for R for Journalists class Shall We Dance 58 Captain Phillips 54 Extremely Loud & Incredibly Close 47 Cloud Atlas 46 Firewall 44 Nights in Rodanthe 43 What Lies Beneath 42 O Brother, Where Art Thou? 42 Charlie Wilson's War 41 Bee Season 41 Apollo 13 40 Random Hearts 39 One Fine Day 38 Leatherheads 38 The Good German 37 Knight and Day 37 Cast Away 37 Up in the Air 36 The Money Pit 36 The Hoax 36 Chocolat 36 The Tourist 35 Flight 35 Amelia 35 The American 34 Rock of Ages 34 Ocean's Eleven 34 Malcolm X 34 John Q 34 Intolerable Cruelty 34 Arbitrage 34 The Preacher's Wife 33 Regarding Henry 33 Oblivion 33 Losin It 33 Batman & Robin 33 Solaris 32 Out of Time 32

Movie Working Girl 31 Sleepless in Seattle 31 Transcendence 30 Sommersby 30 Sabrina 30 Mission Impossible III 30 First Knight 30 DejaVu 30 Six Days Seven Nights 29 Remember the Titans 29 Raiders of the Lost Ark 29 American Ganster 29 Top Gun 28 Out of Sight 28 Mo' Better Blues 28 Jerry Maguire 28 Forrest Gump 28 Autumn in New York 28 Witness 27 Vanilla Sky 27 Training Day 27 An Officer and a Gentleman 27 Blow 26 The Rum Diary 25 Far and Away 24 Dark Shadows 24 Splash 23 Risky Business 23 Empire Strikes Back 23 Pretty Woman 22 He Got Game 22 Blade Runner 22 Pirates of the Caribbean 21 What's Eating Gilbert Grape 20 Sleepy Hollow 19 Edward Scissorhands 19 Actress age Data from Vulture.com and IMDB

So, we added two new ggplot2 elements: geom_text() and theme(). We passed the actress_age variable to label and also used hjust= which means horizontally adjust the location. Alternatively, vjust would adjust vertically. In theme() there are a bunch of things passed, including panel.border and axis.text.x and made them equal element_blank(). Each piece of the chart can be customized and eliminated with *element_blank(). Not bad looking! Let’s save it.

8 Saving ggplots

We’ll use ggsave() from the ggplot2 package. File types that can be exported: • png • tex • pdf • jpeg • tiff • bmp • svg You can specify the width of the image in units of “in”, “cm”, “or mm”. Otherwise it saves based on the size of how it displayed on your screen. ggsave("actress_ages.png")

## Saving 6.5 x 4.5 in image How’s it look?

Ew, okay. Needs some adjustment. I guess we can’t go with the default display for this particular chart. ggsave("actress_ages_adjusted.png", width=20, height=30, units="cm")

9 10 Much better! You could then save it as a .svg file and tweak it even further in Adobe Illustrator or Inkscape. Alright, I’m going to tweak it some more by adding actor ages. We just need to adjust the geom_segment() and another geom_point() layer so it uses the actor_age variable. # First, let's permanently reorder the data frame so we don't have to keep using fct_reorder

library(dplyr)

ages_reordered <- ages %>% mutate(Movie=fct_reorder(Movie, desc(actor_age)))

ggplot(ages_reordered) + geom_segment( aes(x = actress_age, y = Movie, xend = actor_age, yend = Movie), color = "gray50") + geom_point(aes(x=actress_age, y=Movie), color="dark green") + geom_point(aes(x=actor_age, y=Movie), color="dark blue") + labs(x="", y="", title = "Actor and actress ages in movies", subtitle = "for R for Journalists class", caption = "Data from Vulture.com and IMDB") + theme_minimal() + geom_text(aes(x=actress_age, y=Movie, label=actress_age), hjust=ifelse(ages$actress_age

11 Actor and actress ages in movies for R for Journalists class Losin It 20 33 Risky Business 21 23 Top Gun 23 28 Splash 23 27 Edward Scissorhands 19 27 The Money Pit 29 36 Far and Away 24 29 What's Eating Gilbert Grape 20 30 An Officer and a Gentleman 27 32 Jerry Maguire 28 34 One Fine Day 35 38 Mo' Better Blues 28 35 Sleepy Hollow 19 36 Sleepless in Seattle 31 36 Batman & Robin 33 36 Out of Sight 28 37 Malcolm X 34 37 Forrest Gump 28 37 Empire Strikes Back 23 37 Chocolat 36 37 Blow 26 37 Raiders of the Lost Ark 29 38 Apollo 13 38 40 Vanilla Sky 27 39 O Brother, Where Art Thou? 39 42 Blade Runner 22 39 Pretty Woman 22 40 Ocean's Eleven 34 40 The Preacher's Wife 33 41 Solaris 32 41 Witness 27 42 Intolerable Cruelty 34 42 Sommersby 30 43 Pirates of the Caribbean 21 43 Mission Impossible III 30 43 He Got Game 22 43 Cast Away 37 44 The Good German 37 45 Remember the Titans 29 45 First Knight 30 45 Working Girl 31 46 Training Day 27 46 Leatherheads 38 46 The Tourist 35 47 Knight and Day 37 47 John Q 34 47 Up in the Air 36 48 The Rum Diary 25 48 Regarding Henry 33 48 Out of Time 32 48 Dark Shadows 24 48 Transcendence 30 49 The American 34 49 Rock of Ages 34 50 Oblivion 33 50 Autumn in New York 28 50 DejaVu 30 51 Charlie Wilson's War 41 51 American Ganster 29 52 Sabrina 30 53 Six Days Seven Nights 29 55 Shall We Dance 55 58 Extremely Loud & Incredibly Close 47 55 Cloud Atlas 46 56 Bee Season 41 56 The Hoax 36 57 Random Hearts 39 57 Flight 35 57 Captain Phillips 54 57 What Lies Beneath 42 58 Nights in Rodanthe 43 59 Amelia 35 60 Firewall 44 63 Arbitrage 34 63

Data from Vulture.com and IMDB

This time I left the x and y axis labels blank because it seemed redundant.

Scales

Let’s talk about scales. Axes

12 • scale_x_continuous() • scale_y_continuous() • scale_x_discrete() • scale_y_discrete() Colors • scale_color_continuous() • scale_color_manual() • scale_color_brewer() Fill • scale_fill_continuous() • scale_fill_manual() ggplot(ages, aes(x=actor_age, y=actress_age)) + geom_point() + scale_x_continuous(breaks=seq(20,30,2), limits=c(20,30)) + scale_y_continuous(breaks=seq(20,40,4), limits=c(20,40))

## Warning: Removed 67 rows containing missing values (geom_point).

40

36

32

actress_age 28

24

20

20 22 24 26 28 30 actor_age

By setting breaks in scale_x_continuous(), we limited the breaks where the chart was divided on the x axis in intervals of 2. And we limited the x axis with limit between 20 and 30. All other data points were dropped. By setting breaks in scale_y_continuous(), we limited the breaks where the chart was divided on the x axis in intervals of 4. And we limited the x axis with limit between 20 and 40. All other data points were dropped. That was limiting the scale by continuous data.

13 Here’s how to set limits on discrete data. ggplot(ages, aes(x=actor)) + geom_bar() + scale_x_discrete(limits=c("", "", ""))

## Warning: Removed 43 rows containing -finite values (stat_count).

9

6 count

3

0

Tom Hanks Tom Cruise Denzel Washington actor

Scales for color and fill

It’s possible to manually change the colors of your chart. You can use hex symbols or the name of a color if it’s recognized. We’ll use scale_fill_manual(). library(dplyr)

avg_age <- ages %>% group_by(actor) %>% mutate(age_diff = actor_age-actress_age) %>% summarize(average_age_diff = mean(age_diff))

ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) + geom_bar(stat="identity") + theme(legend.position="none") + # This removes the legend scale_fill_manual(values=c("aquamarine", "darkorchid", "deepskyblue2", "lemonchiffon2", "orange", "peachpuff3", "tomato"))

14 15

10 average_age_diff

5

0

Denzel Washington Johnny Depp Richard Gere Tom Cruise Tom Hanks actor You can also specify a color palette using scale_fill_brewer() or scale_color_brewer() ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) + geom_bar(stat="identity") + theme(legend.position="none") + scale_fill_brewer()

15

10 average_age_diff

5

0

Denzel Washington George Clooney Harrison Ford Johnny Depp Richard Gere Tom Cruise Tom Hanks actor

Check out some of the other palette options that can be passed to brewer. ggplot(avg_age, aes(x=actor, y=average_age_diff, fill=actor)) + geom_bar(stat="identity") + theme(legend.position="none") +

15 scale_fill_brewer(palette="Pastel1")

15

10 average_age_diff

5

0

Denzel Washington George Clooney Harrison Ford Johnny Depp Richard Gere Tom Cruise Tom Hanks actor

Did you know that someone made a Wes Anderson color palette package based on his different movies?

Annotations

You can annotate charts with annotate() and geom_hline() or geom_vline(). ggplot(ages, aes(x=actor_age, y=actress_age)) + geom_point() + geom_hline(yintercept=50, color="red") + annotate("text", x=40, y=51, label="Random text for some reason", color="red")

16 Random text for some reason 50

40 actress_age

30

20

20 30 40 50 60 actor_age

Themes

You’ve seen an example of a theme used in a previous chart. theme_bw(). But there are many more that have been created and collected into the ggthemes library. Here’s one for the economist # If you don't have ggthemes installed yet, uncomment the line below and run it #install.packages("ggthemes")

library(ggthemes) ggplot(ages, aes(x=actor_age, y=actress_age, color=actor)) + geom_point() + theme_economist() + scale_colour_economist()

17 Denzel Washington Harrison Ford Richard Gere Tom Hanks actor George Clooney Johnny Depp Tom Cruise

50

40 actress_age

30

20

20 30 40 50 60 actor_age

Here’s one based on FiveThirtyEight’s style (though it’s not the official one). ggplot(ages, aes(x=actor_age, y=actress_age, color=actor)) + geom_point() + theme_fivethirtyeight()

18 50

40

30

20

20 30 40 50 60

Denzel Washington Harrison Ford Richard Gere Tom Hanks actor George Clooney Johnny Depp Tom Cruise

Check out all the other ones currently available. It’s not difficult to make your own. It’s just time consuming. It involves tweaking every little detail, like text, and colors, and how the grids should look. Check out the theme that the Associated Press uses. They posted it on their repo and by loading their own package, they can just add theme_ap() at the end of their charts to transform it to AP style.

Your turn

Challenge yourself with these exercises so you’ll retain the knowledge of this section. Instructions on how to run the exercise app are on the intro page to this section.

19