ggplot2
: data visualization system based on the grammar of graphicsggplot2
: data visualization system based on the grammar of graphicsWe will understand how:
aesthetics
geoms
scales
facets
scales/guides
themes
and other optionsggsave()
aesthetics
, geoms
, layering, scales
, facets
, themes
Note: This is not an exhaustive tutorial; it is an overview of the features I use most often. Much more at ggplot2.tidyverse.org
!
ggplot2
ggplot2
ggplot2
(compare this to trying to remember all the arguments to par
)
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point(stat = "identity") + scale_x_continuous( limits = c(min(df$xvar), max(df$xvar)), name = "X Axis" ) + scale_y_continuous( limits = c(min(df$yvar), max(df$yvar)), name = "Y Axis" )
with(df, plot(yvar ~ xvar))
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point()
with(df, plot(yvar ~ xvar))
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point()
ggplot(data = ..., ...)
Everything starts with a data.frame.
Anything that represents data on the plot must be within a data.frame.
aesthetics
print(head(df, n = 3), digits = 2)
## xvar yvar## 1 -0.34 -0.08## 2 -1.64 -1.03## 3 -1.06 -0.78
ggplot( data = df, aes(x = xvar, y = yvar))
print(head(df, n = 3), digits = 2)
## xvar yvar## 1 -0.34 -0.08## 2 -1.64 -1.03## 3 -1.06 -0.78
ggplot( data = df, aes(x = xvar, y = yvar))
print(head(df, n = 3), digits = 2)
## xvar yvar## 1 -0.34 -0.08## 2 -1.64 -1.03## 3 -1.06 -0.78
ggplot( data = df, aes(x = xvar, y = yvar))
print(head(df, n = 3), digits = 2)
## xvar yvar## 1 -0.34 -0.08## 2 -1.64 -1.03## 3 -1.06 -0.78
ggplot( data = df, aes(x = xvar, y = yvar))
aesthetics
+ geoms
= 👯geoms
geoms
determine the shape of the data.
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point()
geoms
geoms
determine the shape of the data.
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point()
geoms
geoms
determine the shape of the data.
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_line()
geoms
geoms
determine the shape of the data.
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_line()
geoms
...aesthetics
The aesthetics
you need depend on the geom
you want to show.
Examples:
geom_point
, geom_line
each need only X, Y valuesgeom_ribbon
needs X, but rather than a single Y, it needs ymin
and ymax
geoms
...aesthetics
The aesthetics
you need depend on the geom
you want to show.
Examples:
geom_point
, geom_line
each need only X, Y valuesgeom_ribbon
needs X, but rather than a single Y, it needs ymin
and ymax
Look at the help files for these geoms
and see what aesthetics
each one needs.
library(ggplot2)?geom_line?geom_boxplot?geom_bar?geom_ribbon
Using the gapminder
dataset from the year 2007, show the relationship between gdpPercap
and lifeExp
using aesthetics
and geoms
.
# install.packages("gapminder")library(gapminder)head(gapminder)
## # A tibble: 6 x 6## country continent year lifeExp pop gdpPercap## <fct> <fct> <int> <dbl> <int> <dbl>## 1 Afghanistan Asia 1952 28.8 8425333 779## 2 Afghanistan Asia 1957 30.3 9240934 821## 3 Afghanistan Asia 1962 32.0 10267083 853## 4 Afghanistan Asia 1967 34.0 11537966 836## 5 Afghanistan Asia 1972 36.1 13079460 740## 6 Afghanistan Asia 1977 38.4 14880372 786
gap2007 <- subset(gapminder, year == 2007)
p <- ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp))
p + geom_point()
p + geom_line()
geoms
+ stats
Some geoms
require the user to supply all the information needed to map each point.
geom_point
, geom_line
, geom_ribbon
geoms
+ stats
Some geoms
require the user to supply all the information needed to map each point.
geom_point
, geom_line
, geom_ribbon
Others use stats
behind the scenes to summarize the data.
ggplot( data = df, aes(x = 1, y = yvar)) + geom_boxplot()
Often we won't need to call stats
explicitly; geoms
have excellent defaults that do much of the work for us!
We want to represent the same data, using a summary as well as the raw data.
We can do this by layering geoms
.
ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point()
We want to represent the same data, using a summary as well as the raw data.
We can do this by layering geoms
.
ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point() + geom_smooth()
We want to represent the same data, using a summary as well as the raw data.
We can do this by layering geoms
.
ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point() + geom_smooth() + geom_rug()
Summarize and show raw data for each country's gross domestic product (gdpPercap
).
For extra credit 😁, do this separately for each continent.
geoms
would you use?aesthetics
would you use?Summarize and show raw data for each country's gross domestic product (gdpPercap
).
For extra credit 😁, do this separately for each continent.
geoms
would you use?aesthetics
would you use?ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point()
positions
help us change the position of data a bit. Positions are still based on aesthetics
, but sometimes it is helpful to modify those values.
For example, you may have many single points with the same value, or several groups contained within one value.
positions
help us change the position of data a bit. Positions are still based on aesthetics
, but sometimes it is helpful to modify those values.
For example, you may have many single points with the same value, or several groups contained within one value.
Common position
functions:
position_dodge()
: vertical position stays the same; horizontal changesposition_stack()
: stacks bars on top of one anotherposition_fill()
: stacks bars and standardizes each to the same heightposition_jitter()
: adds random noise to values to avoid overplottingrct_df <- data.frame( trt = factor(c("A", "A", "B", "B")), sex = factor(rep(c("Male", "Female"), 2)), npts = c(52, 48, 65, 75))
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = position_dodge() )
position_dodge()
rct_df <- data.frame( trt = factor(c("A", "A", "B", "B")), sex = factor(rep(c("Male", "Female"), 2)), npts = c(52, 48, 65, 75))
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = position_stack() )
position_stack()
rct_df <- data.frame( trt = factor(c("A", "A", "B", "B")), sex = factor(rep(c("Male", "Female"), 2)), npts = c(52, 48, 65, 75))
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = position_fill() )
position_fill()
Because positions
are functions, we can add arguments to control them further.
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = position_dodge2(padding = 0.2) )
But if we want the default position settings, we can use shortcuts:
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = "dodge2" )
Take the boxplot we made earlier and use a position
to reduce the overplotting of the raw data.
ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point()
ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point( position = "jitter" )
ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point( position = position_jitter(width = 0.25) )
...so far, we have only explicitly specified our data
and aesthetics
in the initial ggplot()
call?
Even when we had three separate layers!
...so far, we have only explicitly specified our data
and aesthetics
in the initial ggplot()
call?
Even when we had three separate layers!
ggplot2
uses inheritance. This means that each layer uses the same data
and aesthetics
set in ggplot(...)
, unless we tell it otherwise.
We may show the point estimate and confidence interval for a continuous variable from a linear regression model using geom_line
and geom_ribbon
, then show the original, unadjusted data using geom_point
.
ggplot( data = predvals, aes(x = pointest, y = adjvalue)) + ## Use inherited data; ## specify aesthetics geom_ribbon( aes(ymin = lcl, ymax = ucl) ) + ## Use inherited data + aesthetics geom_line() + ## Add raw data geom_point( aes(x = covar, y = orgvalue), data = orgdata )
What data is mapped to which plot characteristic
aes(x = ...)
How to map
data to plot characteristics
scale_x_...(...)
aes(thetics)
Put gdpPercap
on Y axis
ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point() + scale_y_continuous( limits = c(0, 5000), breaks = seq(0, 5000, 1025), labels = scales::comma, name = "GDP per Capita" )
scales
Note: ggplot2
automatically set gridlines at our break points!
Note how scales
control the legend!
aes(thetics)
"Fill" the bars with colors by sex
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = "dodge2" ) + scale_fill_hue( h = c(90, 270), l = 40, ## change *hues*, *lightness* name = "Patient Sex" ## change *name* )
scales
scales
Scales generally correspond to aesthetics. Some common scale types:
scale_[x, y]_[continuous/discrete]
scale_[colour, fill]_[many options!]
scale_size_...
, scale_shape_...
, scale_alpha_...
This is not an exhaustive list. See the ggplot2
reference pages for more options.
Note: For color
scales and aesthetics, you can use either color
or colour
.
scales
Scales generally correspond to aesthetics. Some common scale types:
scale_[x, y]_[continuous/discrete]
scale_[colour, fill]_[many options!]
scale_size_...
, scale_shape_...
, scale_alpha_...
This is not an exhaustive list. See the ggplot2
reference pages for more options.
Note: For color
scales and aesthetics, you can use either color
or colour
.
scales
have intelligent default values, but you can use different scale types to use specific values you choose. Examples:
Using the boxplot we made earlier, use scales
(and maybe aesthetics
) to
?scale_x_discrete
?scale_color_hue
Using the boxplot we made earlier, use scales
(and maybe aesthetics
) to
?scale_x_discrete
?scale_color_hue
ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot(outlier.shape = NA) + geom_point( aes(color = continent), position = position_jitter(width = 0.2), alpha = 0.6 ) + scale_x_discrete(name="Continent") + scale_color_hue(guide = FALSE)
By default, ggplot2
uses color scales that allow for the most difference between categories on a color wheel, or to show a spectrum of continuous values.
You can change these defaults in several ways, including:
scale_color_hue()
or scale_color_gradient()
- for example, change the gradient color from the default blue to green, or change the range of hues for a categorical variablescale_color_brewer()
for categorical, scale_color_distiller()
for continuous)"blue"
or hex colors (eg, #FAFAFA
): scale_color_manual()
viridis
color schemes. (These are not currently included in ggplot2
itself, but will be in the next version released to CRAN this summer. You can install the viridisLite
package to use them now.) These scales print well, even in black & white, and are built to be perceived by people with color blindness.(All scale options above also apply to scale_fill_xxxx
)
p <- ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point(aes(color = country, size = pop), alpha = 0.5)
p + scale_color_hue(guide = FALSE)
p + scale_color_manual(values = country_colors, guide = FALSE)
viridis
library(viridisLite) ## or install development version of ## ggplot2 from Githubp + scale_color_viridis_d(guide = FALSE)
facets
🔢We can use facets
to show the same visualization for related groups.
ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) + facet_wrap(~ continent) + geom_point() + geom_smooth()
facet_wrap()
You have one categorical variable
facet_grid()
You have two categorical variables, want to show each combination
Both have these arguments (among others):
nrow
, ncol
: if you want a specific layoutscales
fixed
: default; same for every panelfree_y
, free_x
, free
: allow X and/or Y axis to change for each panelfacet_grid()
exampleFirst + last years available for each continent in the gapminder
data
ggplot( data = subset( gapminder, year %in% c(1952, 2007) & !(continent == "Oceania") ), aes(x = gdpPercap, y = lifeExp)) + facet_grid(year ~ continent) + geom_point() + geom_smooth()
facet_grid()
exampleFirst + last years available for each continent in the gapminder
data
ggplot( data = subset( gapminder, year %in% c(1952, 2007) & !(continent == "Oceania") ), aes(x = gdpPercap, y = lifeExp)) + facet_grid(year ~ continent) + geom_point() + geom_smooth()
Leaving scales consistent between panels is great for comparisons between panels (like here). If we are more interested in informing than comparison, we may want to let the scales vary by panel.
facet_grid()
exampleggplot( data = subset( gapminder, year %in% c(1952, 2007) & !(continent == "Oceania") ), aes(x = gdpPercap, y = lifeExp)) + facet_grid( year ~ continent, scales = "free_x" ) + geom_point() + geom_smooth()
facet_grid()
exampleggplot( data = subset( gapminder, year %in% c(1952, 2007) & !(continent == "Oceania") ), aes(x = gdpPercap, y = lifeExp)) + facet_grid( year ~ continent, scales = "free_x" ) + geom_point() + geom_smooth()
The X axis scales for each column are still the same for the two rows, allowing us to see the differences between 1952 and 2007 clearly for each continent.
But the X axis scales are different from column to column; we can see the relationships for Africa without being affected by the larger GDPs found in Europe, for example.
Using the original gapminder
data for 1952 and 2007, update your boxplots of GDP by continent to show one column for each of those two years, and one row for each continent.
Hint: You can set the X aesthetic to always be 1.
Using the original gapminder
data for 1952 and 2007, update your boxplots of GDP by continent to show one column for each of those two years, and one row for each continent.
Hint: You can set the X aesthetic to always be 1.
ggplot( data = subset( gapminder, year %in% c(1952, 2007) ), aes( x = 1, y = gdpPercap )) + facet_grid( continent ~ year, scales = "free_y" ) + geom_boxplot() + geom_point()
Give your plots a different look
Control plot elements not related to data
Using ggplot2 themes
, it is easy to give your plots a different look with one line.
p <- ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point()
p + theme_bw()
p + theme_minimal()
themes
We also use themes
to change elements of the plot that are not related to data
:
p + theme_minimal()
p + theme_minimal() + theme( panel.border = element_rect( fill = NA, color = "#b756b9", size = 5 ) )
elements
Pieces of the theme
are controlled by the element
functions (examples):
element_line()
: gridlines, axis lineselement_text()
: axis labels, titles, captionselement_rect()
: plot and panel backgrounds, facet
strip backgroundselement_blank()
: can be used for any element to "make it disappear"Use these functions to set aspects like sizes, colors, font faces...
Give a boxplot we made earlier a different overall theme
, then customize at least one element. You might...
strip.background
a different colorGive a boxplot we made earlier a different overall theme
, then customize at least one element. You might...
strip.background
a different colorp + theme_minimal() + theme( axis.title.x = element_blank(), axis.text.x = element_blank(), strip.background = element_rect(fill = "gray90"), panel.background = element_rect(fill = NA, color = "gray90"), legend.position = "none" )
We may want to add a title to our plot, or a caption to give additional information about how something was defined. Maybe we want to name an aesthetic "Patient Sex" instead of "sex" but with less typing than scale_x_discrete(name = "Patient Sex")
.
labs()
allows us to set these elements:
title
subtitle
caption
aesthetics
(these will show up in the legend, or X/Y axis titles)p + labs(x = "My X axis title")
We may want to add a title to our plot, or a caption to give additional information about how something was defined. Maybe we want to name an aesthetic "Patient Sex" instead of "sex" but with less typing than scale_x_discrete(name = "Patient Sex")
.
labs()
allows us to set these elements:
title
subtitle
caption
aesthetics
(these will show up in the legend, or X/Y axis titles)p + labs(x = "My X axis title")
Use labs()
to add a plot title and change the Y axis title on the plot you just made.
To control elements of the plot that represent data, but not in a way directly tied to our data, we can still use aesthetic qualities. However, we will set them outside the aes(...)
function.
Example: We want all our points to be a certain color, or to make a line width thicker to be seen better in a presentation.
ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point( color = "#a7a9ac", alpha = 0.75, size = 1.25 ) + geom_smooth( fill = "#532354", color = "#b756b9", alpha = 0.2, size = 2 ) + geom_rug( alpha = 0.3, color = "#532354" )
gapminder
🌏Create a publication-quality chart showing the relationship between gross domestic product and life expectancy, 1952 and 2007.
1) Create a subset of the data
## Only look at 1952, 2007gap_sub <- subset( gapminder, year %in% c(1952, 2007))
2) Initialize our plot object
gdp_exp <- ggplot( data = gap_sub, aes(x = gdpPercap, y = lifeExp))
gdp_exp
facet
by year and continentgdp_exp <- gdp_exp + facet_grid(year ~ continent)
We want to use a different color for each country, and make our point sizes vary according to the countries' population size.
gdp_exp <- gdp_exp + geom_point( aes(color = country, size = pop), alpha = 0.6 ## to help overplotting )
We want to use a different color for each country, and make our point sizes vary according to the countries' population size.
gdp_exp <- gdp_exp + geom_point( aes(color = country, size = pop), alpha = 0.6 ## to help overplotting )
scales
As we saw, a legend for color will not be helpful; there are too many countries.
However, a legend would be helpful for the population sizes.
We will use two scales
to
1) Specify the colors we want, using scale_color_manual
and country_colors
, a named vector of hex colors that comes with the gapminder
package
head(country_colors)
## Nigeria Egypt Ethiopia Congo, Dem. Rep. ## "#7F3B08" "#833D07" "#873F07" "#8B4107" ## South Africa Sudan ## "#8F4407" "#934607"
2) Format a legend for our population sizes
scales
gdp_exp <- gdp_exp + scale_color_manual( ## Manually specify colors values = country_colors, ## Turn off the legend for colors guide = FALSE ) + scale_size( range = c(3, 7), labels = function(x){ scales::comma(x / 1000000) }, name = "Population\n(x 1M)" )
We see that there is a lot of space on the X axis that may be unnecessary, possibly caused by one point. What point is it?
ggplot( data = gap_sub, aes(x = gdpPercap)) + geom_histogram()
We see that there is a lot of space on the X axis that may be unnecessary, possibly caused by one point. What point is it?
ggplot( data = gap_sub, aes(x = gdpPercap)) + geom_histogram()
The extreme outlier in 1952 is from Kuwait; looking at other years in the gapminder
data, it seems to be a legitimate value, but including it is keeping us from seeing the other data as clearly.
We'll exclude it from our plot, but will make a note in a figure caption.
scale
## Save X axis title to a string so we can see the whole thingxtitle <- "Gross Domestic Product per Capita\n(x $1,000 USD)"
gdp_exp <- gdp_exp + scale_x_continuous( limits = c(0, 55000), labels = function(x){ scales::dollar(x / 1000) }, name = xtitle )
plottitle <- "GDP vs Life Expectancy, 1952 and 2007"captitle <- sprintf( "Kuwait's 1952 values were excluded due to its extremely high GDP of $%s.\nAverage life expectancy at that time was %s.", format( round(subset(gap_sub, year == 1952 & country == "Kuwait")$gdpPercap), big.mark = "," ), round(subset(gap_sub, year == 1952 & country == "Kuwait")$lifeExp, 1))
gdp_exp <- gdp_exp + labs( title = plottitle, subtitle = "Source: gapminder.org/data", caption = captitle, y = "Life Expectancy (Years)" )
We want to
gdp_exp <- gdp_exp + theme_bw() + theme( plot.title = element_text( face = "bold", size = 16 ), axis.title.x = element_text( vjust = 0 ), plot.caption = element_text( vjust = 0, face = "italic" ), strip.text = element_text( face = "bold", size = 12 ), legend.position = "bottom", legend.direction = "horizontal" )
Using any of these strategies (or others!), make changes to the boxplots we've been working with.
Some ideas:
Any other ideas are welcome!
The ggsave
function allows us to easily save our figures in several formats. For example, you might want to create a PDF of the figure for a journal submission, but have a PNG for PowerPoint presentations.
ggsave( filename = "gapminder_gdplifeexp.pdf", gdp_exp, device = "pdf", path = "figures/", width = 10, height = 7, units = "in")
ggsave( filename = "gapminder_gdplifeexp.png", gdp_exp, device = "png", path = "figures/", width = 10, height = 7, units = "in")
The ggsave
function allows us to easily save our figures in several formats. For example, you might want to create a PDF of the figure for a journal submission, but have a PNG for PowerPoint presentations.
ggsave( filename = "gapminder_gdplifeexp.pdf", gdp_exp, device = "pdf", path = "figures/", width = 10, height = 7, units = "in")
ggsave( filename = "gapminder_gdplifeexp.png", gdp_exp, device = "png", path = "figures/", width = 10, height = 7, units = "in")
Save the results of your plot to a figures/
directory in two different formats.
ggplot2
reference pageggplot2
extensions, for geoms
that are not included in ggplot2
itselfggthemr
for building custom themes and color palettes (not currently on CRAN; install from Github)ggplot2
: data visualization system based on the grammar of graphicsKeyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |