+ - 0:00:00
Notes for current slide
Notes for next slide

Data Visualization with ggplot2

Jennifer Thompson, MPH

2018-06-06

1 / 69

ggplot2: data visualization system based on the grammar of graphics

2 / 69

ggplot2: data visualization system based on the grammar of graphics

  • Quickly make beautiful, high-quality visualizations
  • Map plot characteristics directly to qualities of data
  • Automatically create informative plot legends
  • Show subsets of data using small multiples
2 / 69

Goals

We will understand how:

  • Data are mapped to aesthetics
  • Plots are built in layers by geoms
  • Control aesthetics using scales
  • Show small multiples using facets
  • Control axes and legends using scales/guides
  • Change plot appearance using themes and other options
  • Save plots using ggsave()

Plan

  1. General overview of key concepts: aesthetics, geoms, layering, scales, facets, themes
  2. Step-by-step example, to demonstrate details

Note: This is not an exhaustive tutorial; it is an overview of the features I use most often. Much more at ggplot2.tidyverse.org!

3 / 69

The Grammar of Graphics

4 / 69

The Grammar of Graphics

A little time learning the grammar ➡️

Power, ease in creating beautiful, informative graphics

4 / 69

The Grammar of Graphics

A little time learning the grammar ➡️

Power, ease in creating beautiful, informative graphics

(compare this to trying to remember all the arguments to par)

4 / 69

Components of Statistical Graphics

  • Data
  • Aesthetic mappings of the data (eg, location or size)
  • Geometric objects (ie, the shape of the data)
  • Scales
  • Statistical transformation
  • Coordinates
5 / 69

Components of Statistical Graphics

  • Data
  • Aesthetic mappings of the data (eg, location or size)
  • Geometric objects (ie, the shape of the data)
  • Scales
  • Statistical transformation
  • Coordinates
ggplot(
data = df,
aes(x = xvar, y = yvar)
) +
geom_point(stat = "identity") +
scale_x_continuous(
limits = c(min(df$xvar), max(df$xvar)),
name = "X Axis"
) +
scale_y_continuous(
limits = c(min(df$yvar), max(df$yvar)),
name = "Y Axis"
)

5 / 69

Base R

with(df, plot(yvar ~ xvar))

ggplot2

ggplot(
data = df,
aes(x = xvar, y = yvar)
) +
geom_point()

6 / 69

Base R

with(df, plot(yvar ~ xvar))

ggplot2

ggplot(
data = df,
aes(x = xvar, y = yvar)
) +
geom_point()

Notice: ggplot2 knows which stat, scales to use by default

6 / 69

It Always Starts with Data

ggplot(data = ..., ...)

Everything starts with a data.frame.

Anything that represents data on the plot must be within a data.frame.

7 / 69

Represent Data with aesthetics

  • "Aesthetics" = how we map the values of data to the appearance of the plot
  • Examples:
    • X, Y axis values
    • Color, shape, sizes of points
    • Opacity, fill of shapes
8 / 69
print(head(df, n = 3), digits = 2)
## xvar yvar
## 1 -0.34 -0.08
## 2 -1.64 -1.03
## 3 -1.06 -0.78
ggplot(
data = df,
aes(x = xvar, y = yvar)
)
9 / 69
print(head(df, n = 3), digits = 2)
## xvar yvar
## 1 -0.34 -0.08
## 2 -1.64 -1.03
## 3 -1.06 -0.78
ggplot(
data = df,
aes(x = xvar, y = yvar)
)

10 / 69
print(head(df, n = 3), digits = 2)
## xvar yvar
## 1 -0.34 -0.08
## 2 -1.64 -1.03
## 3 -1.06 -0.78
ggplot(
data = df,
aes(x = xvar, y = yvar)
)

What Do You Notice?

  • Correct X, Y limits
  • Axis labels
  • ...no actual data
10 / 69
print(head(df, n = 3), digits = 2)
## xvar yvar
## 1 -0.34 -0.08
## 2 -1.64 -1.03
## 3 -1.06 -0.78
ggplot(
data = df,
aes(x = xvar, y = yvar)
)

What Do You Notice?

  • Correct X, Y limits
  • Axis labels
  • ...no actual data

We haven't told it how to show the data!

10 / 69

aesthetics + geoms = 👯

11 / 69

Represent Data with geoms

geoms determine the shape of the data.

ggplot(
data = df, aes(x = xvar, y = yvar)
) +
geom_point()
12 / 69

Represent Data with geoms

geoms determine the shape of the data.

ggplot(
data = df, aes(x = xvar, y = yvar)
) +
geom_point()

12 / 69

Represent Data with geoms

geoms determine the shape of the data.

ggplot(
data = df, aes(x = xvar, y = yvar)
) +
geom_line()
13 / 69

Represent Data with geoms

geoms determine the shape of the data.

ggplot(
data = df, aes(x = xvar, y = yvar)
) +
geom_line()

13 / 69

Different geoms...

different aesthetics

The aesthetics you need depend on the geom you want to show.

Examples:

  • geom_point, geom_line each need only X, Y values
  • geom_ribbon needs X, but rather than a single Y, it needs ymin and ymax
14 / 69

Different geoms...

different aesthetics

The aesthetics you need depend on the geom you want to show.

Examples:

  • geom_point, geom_line each need only X, Y values
  • geom_ribbon needs X, but rather than a single Y, it needs ymin and ymax

Your Turn!

Look at the help files for these geoms and see what aesthetics each one needs.

library(ggplot2)
?geom_line
?geom_boxplot
?geom_bar
?geom_ribbon
14 / 69

Your Turn!

Using the gapminder dataset from the year 2007, show the relationship between gdpPercap and lifeExp using aesthetics and geoms.

# install.packages("gapminder")
library(gapminder)
head(gapminder)
## # A tibble: 6 x 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779
## 2 Afghanistan Asia 1957 30.3 9240934 821
## 3 Afghanistan Asia 1962 32.0 10267083 853
## 4 Afghanistan Asia 1967 34.0 11537966 836
## 5 Afghanistan Asia 1972 36.1 13079460 740
## 6 Afghanistan Asia 1977 38.4 14880372 786
gap2007 <- subset(gapminder, year == 2007)
15 / 69

Your Turn!

p <- ggplot(
data = gap2007,
aes(x = gdpPercap, y = lifeExp)
)
p +
geom_point()

p +
geom_line()

16 / 69

geoms + stats

Some geoms require the user to supply all the information needed to map each point.

  • geom_point, geom_line, geom_ribbon
17 / 69

geoms + stats

Some geoms require the user to supply all the information needed to map each point.

  • geom_point, geom_line, geom_ribbon

Others use stats behind the scenes to summarize the data.

ggplot(
data = df, aes(x = 1, y = yvar)
) +
geom_boxplot()

geom_boxplot aesthetics

Often we won't need to call stats explicitly; geoms have excellent defaults that do much of the work for us!

17 / 69

Layers = Power 💪

18 / 69

Scenario: Raw Data + Regression Line

We want to represent the same data, using a summary as well as the raw data.

We can do this by layering geoms.

ggplot(
data = gap2007,
aes(x = gdpPercap, y = lifeExp)
) +
geom_point()

19 / 69

Scenario: Raw Data + Regression Line

We want to represent the same data, using a summary as well as the raw data.

We can do this by layering geoms.

ggplot(
data = gap2007,
aes(x = gdpPercap, y = lifeExp)
) +
geom_point() +
geom_smooth()

20 / 69

Scenario: Raw Data + Regression Line

We want to represent the same data, using a summary as well as the raw data.

We can do this by layering geoms.

ggplot(
data = gap2007,
aes(x = gdpPercap, y = lifeExp)
) +
geom_point() +
geom_smooth() +
geom_rug()

21 / 69

Your Turn!

Summarize and show raw data for each country's gross domestic product (gdpPercap).

For extra credit 😁, do this separately for each continent.

  • What geoms would you use?
  • What aesthetics would you use?
22 / 69

Your Turn!

Summarize and show raw data for each country's gross domestic product (gdpPercap).

For extra credit 😁, do this separately for each continent.

  • What geoms would you use?
  • What aesthetics would you use?
ggplot(
data = gap2007,
aes(x = continent, y = gdpPercap)
) +
geom_boxplot() +
geom_point()

What would you change?

22 / 69

Positions

positions help us change the position of data a bit. Positions are still based on aesthetics, but sometimes it is helpful to modify those values.

For example, you may have many single points with the same value, or several groups contained within one value.

23 / 69

Positions

positions help us change the position of data a bit. Positions are still based on aesthetics, but sometimes it is helpful to modify those values.

For example, you may have many single points with the same value, or several groups contained within one value.

Common position functions:

  • position_dodge(): vertical position stays the same; horizontal changes
  • position_stack(): stacks bars on top of one another
  • position_fill(): stacks bars and standardizes each to the same height
  • position_jitter(): adds random noise to values to avoid overplotting
23 / 69

Example: Barcharts

rct_df <- data.frame(
trt = factor(c("A", "A", "B", "B")),
sex = factor(rep(c("Male", "Female"), 2)),
npts = c(52, 48, 65, 75)
)
ggplot(
data = rct_df,
aes(x = trt, y = npts,
group = sex, fill = sex)
) +
geom_bar(
stat = "identity",
position = position_dodge()
)

position_dodge()

24 / 69

Example: Barcharts

rct_df <- data.frame(
trt = factor(c("A", "A", "B", "B")),
sex = factor(rep(c("Male", "Female"), 2)),
npts = c(52, 48, 65, 75)
)
ggplot(
data = rct_df,
aes(x = trt, y = npts,
group = sex, fill = sex)
) +
geom_bar(
stat = "identity",
position = position_stack()
)

position_stack()

25 / 69

Example: Barcharts

rct_df <- data.frame(
trt = factor(c("A", "A", "B", "B")),
sex = factor(rep(c("Male", "Female"), 2)),
npts = c(52, 48, 65, 75)
)
ggplot(
data = rct_df,
aes(x = trt, y = npts,
group = sex, fill = sex)
) +
geom_bar(
stat = "identity",
position = position_fill()
)

position_fill()

26 / 69

Positions

Because positions are functions, we can add arguments to control them further.

ggplot(
data = rct_df,
aes(x = trt, y = npts,
group = sex, fill = sex)
) +
geom_bar(
stat = "identity",
position =
position_dodge2(padding = 0.2)
)

27 / 69

Positions

But if we want the default position settings, we can use shortcuts:

ggplot(
data = rct_df,
aes(x = trt, y = npts,
group = sex, fill = sex)
) +
geom_bar(
stat = "identity",
position = "dodge2"
)

28 / 69

Your Turn!

Take the boxplot we made earlier and use a position to reduce the overplotting of the raw data.

ggplot(
data = gap2007,
aes(x = continent, y = gdpPercap)
) +
geom_boxplot() +
geom_point()

29 / 69

Your Turn!

ggplot(
data = gap2007,
aes(x = continent, y = gdpPercap)
) +
geom_boxplot() +
geom_point(
position = "jitter"
)

30 / 69

Your Turn!

ggplot(
data = gap2007,
aes(x = continent, y = gdpPercap)
) +
geom_boxplot() +
geom_point(
position =
position_jitter(width = 0.25)
)

31 / 69

Inheritance

Did You Notice...

...so far, we have only explicitly specified our data and aesthetics in the initial ggplot() call?

Even when we had three separate layers!

32 / 69

Inheritance

Did You Notice...

...so far, we have only explicitly specified our data and aesthetics in the initial ggplot() call?

Even when we had three separate layers!

ggplot2 uses inheritance. This means that each layer uses the same data and aesthetics set in ggplot(...), unless we tell it otherwise.

  • Simplicity: Inheritance means we don't have to specify data and aesthetics multiple times
  • Power: We can use different datasets or aesthetics for each layer if we want to
32 / 69

Example of Inheritance: Marginal Effects Plots

We may show the point estimate and confidence interval for a continuous variable from a linear regression model using geom_line and geom_ribbon, then show the original, unadjusted data using geom_point.

ggplot(
data = predvals,
aes(x = pointest, y = adjvalue)
) +
## Use inherited data;
## specify aesthetics
geom_ribbon(
aes(ymin = lcl, ymax = ucl)
) +
## Use inherited data + aesthetics
geom_line() +
## Add raw data
geom_point(
aes(x = covar, y = orgvalue),
data = orgdata
)
33 / 69

⚖️ Scales for Details ⚖️

aesthetics

What data is mapped to which plot characteristic

aes(x = ...)

scales

How to map
data to plot characteristics

scale_x_...(...)

34 / 69

Boxplot Example

aes(thetics)

Put gdpPercap on Y axis

ggplot(
data = gap2007,
aes(x = continent, y = gdpPercap)
) +
geom_boxplot() +
geom_point() +
scale_y_continuous(
limits = c(0, 5000),
breaks = seq(0, 5000, 1025),
labels = scales::comma,
name = "GDP per Capita"
)

scales

  • Set the axis limits
  • "Break" axis at these places
  • "Name" the axis "GDP Per Capita"

Note: ggplot2 automatically set gridlines at our break points!

35 / 69

Barchart Example

Note how scales control the legend!

aes(thetics)

"Fill" the bars with colors by sex

ggplot(
data = rct_df,
aes(x = trt, y = npts,
group = sex, fill = sex)
) +
geom_bar(
stat = "identity",
position = "dodge2"
) +
scale_fill_hue(
h = c(90, 270), l = 40,
## change *hues*, *lightness*
name = "Patient Sex"
## change *name*
)

scales

  • Use a different color palette
  • Change name to "Patient sex"

36 / 69

Types of scales

Scales generally correspond to aesthetics. Some common scale types:

  • scale_[x, y]_[continuous/discrete]
  • scale_[colour, fill]_[many options!]
  • scale_size_..., scale_shape_..., scale_alpha_...

This is not an exhaustive list. See the ggplot2 reference pages for more options.

Note: For color scales and aesthetics, you can use either color or colour.

37 / 69

Types of scales

Scales generally correspond to aesthetics. Some common scale types:

  • scale_[x, y]_[continuous/discrete]
  • scale_[colour, fill]_[many options!]
  • scale_size_..., scale_shape_..., scale_alpha_...

This is not an exhaustive list. See the ggplot2 reference pages for more options.

Note: For color scales and aesthetics, you can use either color or colour.

scales have intelligent default values, but you can use different scale types to use specific values you choose. Examples:

  • A particular color palette
  • Beginning and ending sizes (maybe you want the smallest size to still be seen on a projector)
  • Breaks on X, Y axes at clinically relevant points
37 / 69

Your Turn!

Using the boxplot we made earlier, use scales (and maybe aesthetics) to

  1. Change the name of the X axis
  2. Make each country's raw data a different color
  3. Not include a legend

?scale_x_discrete

?scale_color_hue

38 / 69

Your Turn!

Using the boxplot we made earlier, use scales (and maybe aesthetics) to

  1. Change the name of the X axis
  2. Make each country's raw data a different color
  3. Not include a legend

?scale_x_discrete

?scale_color_hue

ggplot(
data = gap2007,
aes(x = continent, y = gdpPercap)
) +
geom_boxplot(outlier.shape = NA) +
geom_point(
aes(color = continent),
position =
position_jitter(width = 0.2),
alpha = 0.6
) +
scale_x_discrete(name="Continent") +
scale_color_hue(guide = FALSE)

38 / 69

Choosing a color scale

By default, ggplot2 uses color scales that allow for the most difference between categories on a color wheel, or to show a spectrum of continuous values.

You can change these defaults in several ways, including:

  • Tweak the defaults with scale_color_hue() or scale_color_gradient() - for example, change the gradient color from the default blue to green, or change the range of hues for a categorical variable
  • Use one of the built-in ColorBrewer schemes, which are built to handle sequential, diverging, and qualitative color schemes (scale_color_brewer() for categorical, scale_color_distiller() for continuous)
  • Supply a manual color scheme, using words like "blue" or hex colors (eg, #FAFAFA): scale_color_manual()
  • I personally like the viridis color schemes. (These are not currently included in ggplot2 itself, but will be in the next version released to CRAN this summer. You can install the viridisLite package to use them now.) These scales print well, even in black & white, and are built to be perceived by people with color blindness.

(All scale options above also apply to scale_fill_xxxx)

39 / 69

Color Scale Examples: Default Hues

p <- ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) +
geom_point(aes(color = country, size = pop), alpha = 0.5)
p + scale_color_hue(guide = FALSE)

40 / 69

Color Scale Examples: Manual Values

p + scale_color_manual(values = country_colors, guide = FALSE)

41 / 69

Color Scale Examples: viridis

library(viridisLite)
## or install development version of
## ggplot2 from Github
p + scale_color_viridis_d(guide = FALSE)

42 / 69

🔠 Same Concept, Many facets 🔢

We can use facets to show the same visualization for related groups.

43 / 69

Example: GDP vs Life Expectancy

ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) +
facet_wrap(~ continent) +
geom_point() +
geom_smooth()

44 / 69

facet_wrap()

You have one categorical variable

facet_grid()

You have two categorical variables, want to show each combination


Both have these arguments (among others):

  • nrow, ncol: if you want a specific layout
  • scales
    • fixed: default; same for every panel
    • free_y, free_x, free: allow X and/or Y axis to change for each panel
45 / 69

facet_grid() example

First + last years available for each continent in the gapminder data

ggplot(
data = subset(
gapminder,
year %in% c(1952, 2007) &
!(continent == "Oceania")
),
aes(x = gdpPercap, y = lifeExp)
) +
facet_grid(year ~ continent) +
geom_point() +
geom_smooth()

46 / 69

facet_grid() example

First + last years available for each continent in the gapminder data

ggplot(
data = subset(
gapminder,
year %in% c(1952, 2007) &
!(continent == "Oceania")
),
aes(x = gdpPercap, y = lifeExp)
) +
facet_grid(year ~ continent) +
geom_point() +
geom_smooth()

Leaving scales consistent between panels is great for comparisons between panels (like here). If we are more interested in informing than comparison, we may want to let the scales vary by panel.

46 / 69

facet_grid() example

ggplot(
data = subset(
gapminder,
year %in% c(1952, 2007) &
!(continent == "Oceania")
),
aes(x = gdpPercap, y = lifeExp)
) +
facet_grid(
year ~ continent,
scales = "free_x"
) +
geom_point() +
geom_smooth()

47 / 69

facet_grid() example

ggplot(
data = subset(
gapminder,
year %in% c(1952, 2007) &
!(continent == "Oceania")
),
aes(x = gdpPercap, y = lifeExp)
) +
facet_grid(
year ~ continent,
scales = "free_x"
) +
geom_point() +
geom_smooth()

The X axis scales for each column are still the same for the two rows, allowing us to see the differences between 1952 and 2007 clearly for each continent.

But the X axis scales are different from column to column; we can see the relationships for Africa without being affected by the larger GDPs found in Europe, for example.

47 / 69

Your Turn!

Using the original gapminder data for 1952 and 2007, update your boxplots of GDP by continent to show one column for each of those two years, and one row for each continent.

Hint: You can set the X aesthetic to always be 1.

48 / 69

Your Turn!

Using the original gapminder data for 1952 and 2007, update your boxplots of GDP by continent to show one column for each of those two years, and one row for each continent.

Hint: You can set the X aesthetic to always be 1.

ggplot(
data = subset(
gapminder, year %in% c(1952, 2007)
),
aes(
x = 1,
y = gdpPercap
)
) +
facet_grid(
continent ~ year,
scales = "free_y"
) +
geom_boxplot() +
geom_point()

48 / 69

🌺 Themes 🌺

Give your plots a different look

Control plot elements not related to data

49 / 69

Change the look of your plot

Using ggplot2 themes, it is easy to give your plots a different look with one line.

p <- ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) +
geom_point()
p + theme_bw()

p + theme_minimal()

50 / 69

Changing Plot Elements with themes

We also use themes to change elements of the plot that are not related to data:

  • Increase font sizes
  • Change colors or font faces
  • Change plot backgrounds, gridlines
p + theme_minimal()

p + theme_minimal() +
theme(
panel.border = element_rect(
fill = NA,
color = "#b756b9",
size = 5
)
)

51 / 69

elements

Pieces of the theme are controlled by the element functions (examples):

  • element_line(): gridlines, axis lines
  • element_text(): axis labels, titles, captions
  • element_rect(): plot and panel backgrounds, facet strip backgrounds
  • element_blank(): can be used for any element to "make it disappear"

Use these functions to set aspects like sizes, colors, font faces...

52 / 69

Your Turn!

Give a boxplot we made earlier a different overall theme, then customize at least one element. You might...

  • Make the axis text bold or bigger
  • Make the gridlines thicker or disappear
  • Make the strip.background a different color
53 / 69

Your Turn!

Give a boxplot we made earlier a different overall theme, then customize at least one element. You might...

  • Make the axis text bold or bigger
  • Make the gridlines thicker or disappear
  • Make the strip.background a different color
p +
theme_minimal() +
theme(
axis.title.x = element_blank(),
axis.text.x = element_blank(),
strip.background =
element_rect(fill = "gray90"),
panel.background =
element_rect(fill = NA, color = "gray90"),
legend.position = "none"
)

53 / 69

Adding Labels and Titles

We may want to add a title to our plot, or a caption to give additional information about how something was defined. Maybe we want to name an aesthetic "Patient Sex" instead of "sex" but with less typing than scale_x_discrete(name = "Patient Sex").

labs() allows us to set these elements:

  • title
  • subtitle
  • caption
  • titles for aesthetics (these will show up in the legend, or X/Y axis titles)
p +
labs(x = "My X axis title")
54 / 69

Adding Labels and Titles

We may want to add a title to our plot, or a caption to give additional information about how something was defined. Maybe we want to name an aesthetic "Patient Sex" instead of "sex" but with less typing than scale_x_discrete(name = "Patient Sex").

labs() allows us to set these elements:

  • title
  • subtitle
  • caption
  • titles for aesthetics (these will show up in the legend, or X/Y axis titles)
p +
labs(x = "My X axis title")

Your Turn!

Use labs() to add a plot title and change the Y axis title on the plot you just made.

54 / 69

Controlling Non-Data Elements

To control elements of the plot that represent data, but not in a way directly tied to our data, we can still use aesthetic qualities. However, we will set them outside the aes(...) function.

Example: We want all our points to be a certain color, or to make a line width thicker to be seen better in a presentation.

ggplot(
data = gap2007,
aes(x = gdpPercap, y = lifeExp)
) +
geom_point(
color = "#a7a9ac",
alpha = 0.75,
size = 1.25
) +
geom_smooth(
fill = "#532354",
color = "#b756b9",
alpha = 0.2,
size = 2
) +
geom_rug(
alpha = 0.3, color = "#532354"
)

55 / 69

🌎 Full Examples with gapminder 🌏

56 / 69

Goal

Create a publication-quality chart showing the relationship between gross domestic product and life expectancy, 1952 and 2007.

Preparation

1) Create a subset of the data

## Only look at 1952, 2007
gap_sub <- subset(
gapminder,
year %in% c(1952, 2007)
)

2) Initialize our plot object

gdp_exp <- ggplot(
data = gap_sub,
aes(x = gdpPercap, y = lifeExp)
)
gdp_exp

57 / 69

facet by year and continent

gdp_exp <- gdp_exp +
facet_grid(year ~ continent)

58 / 69

Add data points

We want to use a different color for each country, and make our point sizes vary according to the countries' population size.

gdp_exp <- gdp_exp +
geom_point(
aes(color = country, size = pop),
alpha = 0.6 ## to help overplotting
)
59 / 69

Add data points

We want to use a different color for each country, and make our point sizes vary according to the countries' population size.

gdp_exp <- gdp_exp +
geom_point(
aes(color = country, size = pop),
alpha = 0.6 ## to help overplotting
)

59 / 69

😱 Fix it! With scales

As we saw, a legend for color will not be helpful; there are too many countries.

However, a legend would be helpful for the population sizes.

We will use two scales to

1) Specify the colors we want, using scale_color_manual and country_colors, a named vector of hex colors that comes with the gapminder package

head(country_colors)
## Nigeria Egypt Ethiopia Congo, Dem. Rep.
## "#7F3B08" "#833D07" "#873F07" "#8B4107"
## South Africa Sudan
## "#8F4407" "#934607"

2) Format a legend for our population sizes

60 / 69

Modification of scales

gdp_exp <- gdp_exp +
scale_color_manual(
## Manually specify colors
values = country_colors,
## Turn off the legend for colors
guide = FALSE
) +
scale_size(
range = c(3, 7),
labels = function(x){
scales::comma(x / 1000000)
},
name =
"Population\n(x 1M)"
)

61 / 69

Exploratory data analysis

We see that there is a lot of space on the X axis that may be unnecessary, possibly caused by one point. What point is it?

ggplot(
data = gap_sub,
aes(x = gdpPercap)
) +
geom_histogram()

62 / 69

Exploratory data analysis

We see that there is a lot of space on the X axis that may be unnecessary, possibly caused by one point. What point is it?

ggplot(
data = gap_sub,
aes(x = gdpPercap)
) +
geom_histogram()

The extreme outlier in 1952 is from Kuwait; looking at other years in the gapminder data, it seems to be a legitimate value, but including it is keeping us from seeing the other data as clearly.

We'll exclude it from our plot, but will make a note in a figure caption.

62 / 69

Modify X axis scale

## Save X axis title to a string so we can see the whole thing
xtitle <- "Gross Domestic Product per Capita\n(x $1,000 USD)"
gdp_exp <- gdp_exp +
scale_x_continuous(
limits = c(0, 55000),
labels = function(x){
scales::dollar(x / 1000)
},
name = xtitle
)

63 / 69

Add Labels

plottitle <- "GDP vs Life Expectancy, 1952 and 2007"
captitle <- sprintf(
"Kuwait's 1952 values were excluded due to its extremely high GDP of $%s.\nAverage life expectancy at that time was %s.",
format(
round(subset(gap_sub, year == 1952 & country == "Kuwait")$gdpPercap),
big.mark = ","
),
round(subset(gap_sub, year == 1952 & country == "Kuwait")$lifeExp, 1)
)
gdp_exp <- gdp_exp +
labs(
title = plottitle,
subtitle =
"Source: gapminder.org/data",
caption = captitle,
y = "Life Expectancy (Years)"
)

64 / 69

Modify Theme Elements

We want to

  • Add space between the X axis and title
  • Add space between X axis title and caption
  • Italicize the caption
  • Bold plot titles, axis titles, and strip text
  • Move the legend to the bottom
gdp_exp <- gdp_exp +
theme_bw() +
theme(
plot.title = element_text(
face = "bold", size = 16
),
axis.title.x = element_text(
vjust = 0
),
plot.caption = element_text(
vjust = 0, face = "italic"
),
strip.text = element_text(
face = "bold", size = 12
),
legend.position = "bottom",
legend.direction = "horizontal"
)
65 / 69

Final Result

66 / 69

Your Turn!

Using any of these strategies (or others!), make changes to the boxplots we've been working with.

Some ideas:

  • Changing the color of the boxplots, or removing the inside altogether
  • Changing the breaks of the Y axis
  • Adding axis, plot titles
  • Citing the source of our data

Any other ideas are welcome!

67 / 69

Saving Your Plots

The ggsave function allows us to easily save our figures in several formats. For example, you might want to create a PDF of the figure for a journal submission, but have a PNG for PowerPoint presentations.

ggsave(
filename =
"gapminder_gdplifeexp.pdf",
gdp_exp,
device = "pdf",
path = "figures/",
width = 10,
height = 7,
units = "in"
)
ggsave(
filename =
"gapminder_gdplifeexp.png",
gdp_exp,
device = "png",
path = "figures/",
width = 10,
height = 7,
units = "in"
)
68 / 69

Saving Your Plots

The ggsave function allows us to easily save our figures in several formats. For example, you might want to create a PDF of the figure for a journal submission, but have a PNG for PowerPoint presentations.

ggsave(
filename =
"gapminder_gdplifeexp.pdf",
gdp_exp,
device = "pdf",
path = "figures/",
width = 10,
height = 7,
units = "in"
)
ggsave(
filename =
"gapminder_gdplifeexp.png",
gdp_exp,
device = "png",
path = "figures/",
width = 10,
height = 7,
units = "in"
)

Your Turn!

Save the results of your plot to a figures/ directory in two different formats.

68 / 69

Helpful Resources

69 / 69

ggplot2: data visualization system based on the grammar of graphics

2 / 69
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow