ggplot2: data visualization system based on the grammar of graphicsggplot2: data visualization system based on the grammar of graphicsWe will understand how:
aestheticsgeomsscalesfacetsscales/guidesthemes and other optionsggsave()aesthetics, geoms, layering, scales, facets, themesNote: This is not an exhaustive tutorial; it is an overview of the features I use most often. Much more at ggplot2.tidyverse.org!
ggplot2ggplot2ggplot2(compare this to trying to remember all the arguments to par)
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point(stat = "identity") + scale_x_continuous( limits = c(min(df$xvar), max(df$xvar)), name = "X Axis" ) + scale_y_continuous( limits = c(min(df$yvar), max(df$yvar)), name = "Y Axis" )

with(df, plot(yvar ~ xvar))

ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point()

with(df, plot(yvar ~ xvar))

ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point()

ggplot(data = ..., ...)
Everything starts with a data.frame.
Anything that represents data on the plot must be within a data.frame.
aestheticsprint(head(df, n = 3), digits = 2)
## xvar yvar## 1 -0.34 -0.08## 2 -1.64 -1.03## 3 -1.06 -0.78ggplot( data = df, aes(x = xvar, y = yvar))print(head(df, n = 3), digits = 2)
## xvar yvar## 1 -0.34 -0.08## 2 -1.64 -1.03## 3 -1.06 -0.78ggplot( data = df, aes(x = xvar, y = yvar))

print(head(df, n = 3), digits = 2)
## xvar yvar## 1 -0.34 -0.08## 2 -1.64 -1.03## 3 -1.06 -0.78ggplot( data = df, aes(x = xvar, y = yvar))

print(head(df, n = 3), digits = 2)
## xvar yvar## 1 -0.34 -0.08## 2 -1.64 -1.03## 3 -1.06 -0.78ggplot( data = df, aes(x = xvar, y = yvar))

aesthetics + geoms = 👯geomsgeoms determine the shape of the data.
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point()
geomsgeoms determine the shape of the data.
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_point()

geomsgeoms determine the shape of the data.
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_line()
geomsgeoms determine the shape of the data.
ggplot( data = df, aes(x = xvar, y = yvar)) + geom_line()

geoms...aestheticsThe aesthetics you need depend on the geom you want to show.
Examples:
geom_point, geom_line each need only X, Y valuesgeom_ribbon needs X, but rather than a single Y, it needs ymin and ymaxgeoms...aestheticsThe aesthetics you need depend on the geom you want to show.
Examples:
geom_point, geom_line each need only X, Y valuesgeom_ribbon needs X, but rather than a single Y, it needs ymin and ymaxLook at the help files for these geoms and see what aesthetics each one needs.
library(ggplot2)?geom_line?geom_boxplot?geom_bar?geom_ribbonUsing the gapminder dataset from the year 2007, show the relationship between gdpPercap and lifeExp using aesthetics and geoms.
# install.packages("gapminder")library(gapminder)head(gapminder)
## # A tibble: 6 x 6## country continent year lifeExp pop gdpPercap## <fct> <fct> <int> <dbl> <int> <dbl>## 1 Afghanistan Asia 1952 28.8 8425333 779## 2 Afghanistan Asia 1957 30.3 9240934 821## 3 Afghanistan Asia 1962 32.0 10267083 853## 4 Afghanistan Asia 1967 34.0 11537966 836## 5 Afghanistan Asia 1972 36.1 13079460 740## 6 Afghanistan Asia 1977 38.4 14880372 786gap2007 <- subset(gapminder, year == 2007)p <- ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp))
p + geom_point()

p + geom_line()

geoms + statsSome geoms require the user to supply all the information needed to map each point.
geom_point, geom_line, geom_ribbongeoms + statsSome geoms require the user to supply all the information needed to map each point.
geom_point, geom_line, geom_ribbonOthers use stats behind the scenes to summarize the data.
ggplot( data = df, aes(x = 1, y = yvar)) + geom_boxplot()


Often we won't need to call stats explicitly; geoms have excellent defaults that do much of the work for us!
We want to represent the same data, using a summary as well as the raw data.
We can do this by layering geoms.
ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point()

We want to represent the same data, using a summary as well as the raw data.
We can do this by layering geoms.
ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point() + geom_smooth()

We want to represent the same data, using a summary as well as the raw data.
We can do this by layering geoms.
ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point() + geom_smooth() + geom_rug()

Summarize and show raw data for each country's gross domestic product (gdpPercap).
For extra credit 😁, do this separately for each continent.
geoms would you use?aesthetics would you use?Summarize and show raw data for each country's gross domestic product (gdpPercap).
For extra credit 😁, do this separately for each continent.
geoms would you use?aesthetics would you use?ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point()

positions help us change the position of data a bit. Positions are still based on aesthetics, but sometimes it is helpful to modify those values.
For example, you may have many single points with the same value, or several groups contained within one value.
positions help us change the position of data a bit. Positions are still based on aesthetics, but sometimes it is helpful to modify those values.
For example, you may have many single points with the same value, or several groups contained within one value.
Common position functions:
position_dodge(): vertical position stays the same; horizontal changesposition_stack(): stacks bars on top of one anotherposition_fill(): stacks bars and standardizes each to the same heightposition_jitter(): adds random noise to values to avoid overplottingrct_df <- data.frame( trt = factor(c("A", "A", "B", "B")), sex = factor(rep(c("Male", "Female"), 2)), npts = c(52, 48, 65, 75))
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = position_dodge() )
position_dodge()
rct_df <- data.frame( trt = factor(c("A", "A", "B", "B")), sex = factor(rep(c("Male", "Female"), 2)), npts = c(52, 48, 65, 75))
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = position_stack() )
position_stack()
rct_df <- data.frame( trt = factor(c("A", "A", "B", "B")), sex = factor(rep(c("Male", "Female"), 2)), npts = c(52, 48, 65, 75))
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = position_fill() )
position_fill()
Because positions are functions, we can add arguments to control them further.
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = position_dodge2(padding = 0.2) )

But if we want the default position settings, we can use shortcuts:
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = "dodge2" )

Take the boxplot we made earlier and use a position to reduce the overplotting of the raw data.
ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point()

ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point( position = "jitter" )

ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point( position = position_jitter(width = 0.25) )

...so far, we have only explicitly specified our data and aesthetics in the initial ggplot() call?
Even when we had three separate layers!
...so far, we have only explicitly specified our data and aesthetics in the initial ggplot() call?
Even when we had three separate layers!
ggplot2 uses inheritance. This means that each layer uses the same data and aesthetics set in ggplot(...), unless we tell it otherwise.
We may show the point estimate and confidence interval for a continuous variable from a linear regression model using geom_line and geom_ribbon, then show the original, unadjusted data using geom_point.

ggplot( data = predvals, aes(x = pointest, y = adjvalue)) + ## Use inherited data; ## specify aesthetics geom_ribbon( aes(ymin = lcl, ymax = ucl) ) + ## Use inherited data + aesthetics geom_line() + ## Add raw data geom_point( aes(x = covar, y = orgvalue), data = orgdata )
What data is mapped to which plot characteristic
aes(x = ...)
How to map
data to plot characteristics
scale_x_...(...)
aes(thetics)Put gdpPercap on Y axis
ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot() + geom_point() + scale_y_continuous( limits = c(0, 5000), breaks = seq(0, 5000, 1025), labels = scales::comma, name = "GDP per Capita" )
scales
Note: ggplot2 automatically set gridlines at our break points!
Note how scales control the legend!
aes(thetics)"Fill" the bars with colors by sex
ggplot( data = rct_df, aes(x = trt, y = npts, group = sex, fill = sex)) + geom_bar( stat = "identity", position = "dodge2" ) + scale_fill_hue( h = c(90, 270), l = 40, ## change *hues*, *lightness* name = "Patient Sex" ## change *name* )
scales
scalesScales generally correspond to aesthetics. Some common scale types:
scale_[x, y]_[continuous/discrete]scale_[colour, fill]_[many options!]scale_size_..., scale_shape_..., scale_alpha_...This is not an exhaustive list. See the ggplot2 reference pages for more options.
Note: For color scales and aesthetics, you can use either color or colour.
scalesScales generally correspond to aesthetics. Some common scale types:
scale_[x, y]_[continuous/discrete]scale_[colour, fill]_[many options!]scale_size_..., scale_shape_..., scale_alpha_...This is not an exhaustive list. See the ggplot2 reference pages for more options.
Note: For color scales and aesthetics, you can use either color or colour.
scales have intelligent default values, but you can use different scale types to use specific values you choose. Examples:
Using the boxplot we made earlier, use scales (and maybe aesthetics) to
?scale_x_discrete
?scale_color_hue
Using the boxplot we made earlier, use scales (and maybe aesthetics) to
?scale_x_discrete
?scale_color_hue
ggplot( data = gap2007, aes(x = continent, y = gdpPercap)) + geom_boxplot(outlier.shape = NA) + geom_point( aes(color = continent), position = position_jitter(width = 0.2), alpha = 0.6 ) + scale_x_discrete(name="Continent") + scale_color_hue(guide = FALSE)

By default, ggplot2 uses color scales that allow for the most difference between categories on a color wheel, or to show a spectrum of continuous values.
You can change these defaults in several ways, including:
scale_color_hue() or scale_color_gradient() - for example, change the gradient color from the default blue to green, or change the range of hues for a categorical variablescale_color_brewer() for categorical, scale_color_distiller() for continuous)"blue" or hex colors (eg, #FAFAFA): scale_color_manual()viridis color schemes. (These are not currently included in ggplot2 itself, but will be in the next version released to CRAN this summer. You can install the viridisLite package to use them now.) These scales print well, even in black & white, and are built to be perceived by people with color blindness.(All scale options above also apply to scale_fill_xxxx)
p <- ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point(aes(color = country, size = pop), alpha = 0.5)
p + scale_color_hue(guide = FALSE)

p + scale_color_manual(values = country_colors, guide = FALSE)

viridislibrary(viridisLite) ## or install development version of ## ggplot2 from Githubp + scale_color_viridis_d(guide = FALSE)

facets 🔢We can use facets to show the same visualization for related groups.
ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) + facet_wrap(~ continent) + geom_point() + geom_smooth()

facet_wrap()You have one categorical variable
facet_grid()You have two categorical variables, want to show each combination
Both have these arguments (among others):
nrow, ncol: if you want a specific layoutscalesfixed: default; same for every panelfree_y, free_x, free: allow X and/or Y axis to change for each panelfacet_grid() exampleFirst + last years available for each continent in the gapminder data
ggplot( data = subset( gapminder, year %in% c(1952, 2007) & !(continent == "Oceania") ), aes(x = gdpPercap, y = lifeExp)) + facet_grid(year ~ continent) + geom_point() + geom_smooth()

facet_grid() exampleFirst + last years available for each continent in the gapminder data
ggplot( data = subset( gapminder, year %in% c(1952, 2007) & !(continent == "Oceania") ), aes(x = gdpPercap, y = lifeExp)) + facet_grid(year ~ continent) + geom_point() + geom_smooth()

Leaving scales consistent between panels is great for comparisons between panels (like here). If we are more interested in informing than comparison, we may want to let the scales vary by panel.
facet_grid() exampleggplot( data = subset( gapminder, year %in% c(1952, 2007) & !(continent == "Oceania") ), aes(x = gdpPercap, y = lifeExp)) + facet_grid( year ~ continent, scales = "free_x" ) + geom_point() + geom_smooth()

facet_grid() exampleggplot( data = subset( gapminder, year %in% c(1952, 2007) & !(continent == "Oceania") ), aes(x = gdpPercap, y = lifeExp)) + facet_grid( year ~ continent, scales = "free_x" ) + geom_point() + geom_smooth()

The X axis scales for each column are still the same for the two rows, allowing us to see the differences between 1952 and 2007 clearly for each continent.
But the X axis scales are different from column to column; we can see the relationships for Africa without being affected by the larger GDPs found in Europe, for example.
Using the original gapminder data for 1952 and 2007, update your boxplots of GDP by continent to show one column for each of those two years, and one row for each continent.
Hint: You can set the X aesthetic to always be 1.
Using the original gapminder data for 1952 and 2007, update your boxplots of GDP by continent to show one column for each of those two years, and one row for each continent.
Hint: You can set the X aesthetic to always be 1.
ggplot( data = subset( gapminder, year %in% c(1952, 2007) ), aes( x = 1, y = gdpPercap )) + facet_grid( continent ~ year, scales = "free_y" ) + geom_boxplot() + geom_point()

Give your plots a different look
Control plot elements not related to data
Using ggplot2 themes, it is easy to give your plots a different look with one line.
p <- ggplot(data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point()
p + theme_bw()

p + theme_minimal()

themesWe also use themes to change elements of the plot that are not related to data:
p + theme_minimal()

p + theme_minimal() + theme( panel.border = element_rect( fill = NA, color = "#b756b9", size = 5 ) )

elementsPieces of the theme are controlled by the element functions (examples):
element_line(): gridlines, axis lineselement_text(): axis labels, titles, captionselement_rect(): plot and panel backgrounds, facet strip backgroundselement_blank(): can be used for any element to "make it disappear"Use these functions to set aspects like sizes, colors, font faces...
Give a boxplot we made earlier a different overall theme, then customize at least one element. You might...
strip.background a different colorGive a boxplot we made earlier a different overall theme, then customize at least one element. You might...
strip.background a different colorp + theme_minimal() + theme( axis.title.x = element_blank(), axis.text.x = element_blank(), strip.background = element_rect(fill = "gray90"), panel.background = element_rect(fill = NA, color = "gray90"), legend.position = "none" )

We may want to add a title to our plot, or a caption to give additional information about how something was defined. Maybe we want to name an aesthetic "Patient Sex" instead of "sex" but with less typing than scale_x_discrete(name = "Patient Sex").
labs() allows us to set these elements:
titlesubtitlecaptionaesthetics (these will show up in the legend, or X/Y axis titles)p + labs(x = "My X axis title")We may want to add a title to our plot, or a caption to give additional information about how something was defined. Maybe we want to name an aesthetic "Patient Sex" instead of "sex" but with less typing than scale_x_discrete(name = "Patient Sex").
labs() allows us to set these elements:
titlesubtitlecaptionaesthetics (these will show up in the legend, or X/Y axis titles)p + labs(x = "My X axis title")
Use labs() to add a plot title and change the Y axis title on the plot you just made.
To control elements of the plot that represent data, but not in a way directly tied to our data, we can still use aesthetic qualities. However, we will set them outside the aes(...) function.
Example: We want all our points to be a certain color, or to make a line width thicker to be seen better in a presentation.
ggplot( data = gap2007, aes(x = gdpPercap, y = lifeExp)) + geom_point( color = "#a7a9ac", alpha = 0.75, size = 1.25 ) + geom_smooth( fill = "#532354", color = "#b756b9", alpha = 0.2, size = 2 ) + geom_rug( alpha = 0.3, color = "#532354" )

gapminder 🌏Create a publication-quality chart showing the relationship between gross domestic product and life expectancy, 1952 and 2007.
1) Create a subset of the data
## Only look at 1952, 2007gap_sub <- subset( gapminder, year %in% c(1952, 2007))
2) Initialize our plot object
gdp_exp <- ggplot( data = gap_sub, aes(x = gdpPercap, y = lifeExp))
gdp_exp

facet by year and continentgdp_exp <- gdp_exp + facet_grid(year ~ continent)

We want to use a different color for each country, and make our point sizes vary according to the countries' population size.
gdp_exp <- gdp_exp + geom_point( aes(color = country, size = pop), alpha = 0.6 ## to help overplotting )We want to use a different color for each country, and make our point sizes vary according to the countries' population size.
gdp_exp <- gdp_exp + geom_point( aes(color = country, size = pop), alpha = 0.6 ## to help overplotting )

scalesAs we saw, a legend for color will not be helpful; there are too many countries.
However, a legend would be helpful for the population sizes.
We will use two scales to
1) Specify the colors we want, using scale_color_manual and country_colors, a named vector of hex colors that comes with the gapminder package
head(country_colors)
## Nigeria Egypt Ethiopia Congo, Dem. Rep. ## "#7F3B08" "#833D07" "#873F07" "#8B4107" ## South Africa Sudan ## "#8F4407" "#934607"2) Format a legend for our population sizes
scalesgdp_exp <- gdp_exp + scale_color_manual( ## Manually specify colors values = country_colors, ## Turn off the legend for colors guide = FALSE ) + scale_size( range = c(3, 7), labels = function(x){ scales::comma(x / 1000000) }, name = "Population\n(x 1M)" )

We see that there is a lot of space on the X axis that may be unnecessary, possibly caused by one point. What point is it?
ggplot( data = gap_sub, aes(x = gdpPercap)) + geom_histogram()

We see that there is a lot of space on the X axis that may be unnecessary, possibly caused by one point. What point is it?
ggplot( data = gap_sub, aes(x = gdpPercap)) + geom_histogram()

The extreme outlier in 1952 is from Kuwait; looking at other years in the gapminder data, it seems to be a legitimate value, but including it is keeping us from seeing the other data as clearly.
We'll exclude it from our plot, but will make a note in a figure caption.
scale## Save X axis title to a string so we can see the whole thingxtitle <- "Gross Domestic Product per Capita\n(x $1,000 USD)"
gdp_exp <- gdp_exp + scale_x_continuous( limits = c(0, 55000), labels = function(x){ scales::dollar(x / 1000) }, name = xtitle )

plottitle <- "GDP vs Life Expectancy, 1952 and 2007"captitle <- sprintf( "Kuwait's 1952 values were excluded due to its extremely high GDP of $%s.\nAverage life expectancy at that time was %s.", format( round(subset(gap_sub, year == 1952 & country == "Kuwait")$gdpPercap), big.mark = "," ), round(subset(gap_sub, year == 1952 & country == "Kuwait")$lifeExp, 1))
gdp_exp <- gdp_exp + labs( title = plottitle, subtitle = "Source: gapminder.org/data", caption = captitle, y = "Life Expectancy (Years)" )

We want to
gdp_exp <- gdp_exp + theme_bw() + theme( plot.title = element_text( face = "bold", size = 16 ), axis.title.x = element_text( vjust = 0 ), plot.caption = element_text( vjust = 0, face = "italic" ), strip.text = element_text( face = "bold", size = 12 ), legend.position = "bottom", legend.direction = "horizontal" )

Using any of these strategies (or others!), make changes to the boxplots we've been working with.
Some ideas:
Any other ideas are welcome!
The ggsave function allows us to easily save our figures in several formats. For example, you might want to create a PDF of the figure for a journal submission, but have a PNG for PowerPoint presentations.
ggsave( filename = "gapminder_gdplifeexp.pdf", gdp_exp, device = "pdf", path = "figures/", width = 10, height = 7, units = "in")
ggsave( filename = "gapminder_gdplifeexp.png", gdp_exp, device = "png", path = "figures/", width = 10, height = 7, units = "in")
The ggsave function allows us to easily save our figures in several formats. For example, you might want to create a PDF of the figure for a journal submission, but have a PNG for PowerPoint presentations.
ggsave( filename = "gapminder_gdplifeexp.pdf", gdp_exp, device = "pdf", path = "figures/", width = 10, height = 7, units = "in")
ggsave( filename = "gapminder_gdplifeexp.png", gdp_exp, device = "png", path = "figures/", width = 10, height = 7, units = "in")
Save the results of your plot to a figures/ directory in two different formats.
ggplot2 reference pageggplot2 extensions, for geoms that are not included in ggplot2 itselfggthemr for building custom themes and color palettes (not currently on CRAN; install from Github)ggplot2: data visualization system based on the grammar of graphicsKeyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| Esc | Back to slideshow |