In this lesson we will dive into making common graphics with ggplot2. This approach follows The R Graphics Cookbook by Winston Chang.
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
You will likely find RStudio’s Data Visualization Cheat Sheet helpful as a quick reference. If you want to learn more about ggplot2 after this lesson, the documentation has some good suggestions. ggplot2: Elegant Graphics for Data Analysis is the definitive book on the subject.
- The data is what we want to visualize. It consists of variables, which are stored as columns in a data frame.
- geoms are the geometric objects that are drawn to represent the data, such as bars, lines, and points.
- Aesthetic attributes, or aesthetics, are visual properties of geoms, such as x and y position, line color, point shapes, etc.
- There are mappings from data values to aesthetics.
- scales control the mapping from the values in the data space to values in the aesthetic space. A continuous y scale maps larger numerical values to vertically higher positions in space.
- guides show the viewer how to map the visual properties back to the data space. The most commonly used guides are the tick marks and labels on an axis.
Humans distinguish differences in line, shape, and color without much processing effort, and data visualization can take advantage of this to create charts and graphs to help users understand the data more easily As we go through this tutorial on ggplot2, it may be helpful to use this useful cheat sheet for reference when using ggplot2 Layers. Most of the data lies between 40 and 80 years. We’ll use coordcartesian to zoom in on this region. Africale%% ggplot ( aes (year, lifeexpectancy, color = regiongm8, group = name)) + geomline + coordcartesian ( ylim = c ( 40, 80 )) Individual lines are now easier to see. Data Science: Data Visualization with ggplot2 Cheat Sheet Data Visualization with ggplot2 Cheat Sheet If you like these cheat sheets, you can let me know here. Data Science: Big-O Cheat Sheet Big-O Cheat Sheet. In the third and last of the ggplot series, this post will go over interesting ways to visualize the distribution of your data.
Data Visualization With Ggplot2 Cheat Sheet
ggplot() function is used to initialize the basic graph structure, then we add to it. The basic idea is that you specify different parts of the plot, and add them together using the
We will start with a blank plot.
Geometric objects are the actual marks we put on a plot. Examples include:
To visualize data a plot should have at least one geom; there is no upper limit. You can add a geom to a plot using the
+ operator. Evaluating the following line of code will produce an informative error message.
Each type of geom usually has a required set of aesthetics to be set, and usually accepts only a subset of all aesthetics. Refer to the help pages to see what mappings each geom accepts, for instance
help(geom_point). Aesthetic mappings are set with the
aes() function. Examples include:
- position (
- color (“outside” color)
- fill (“inside” color)
- shape (of points)
To create a scatter plot we need to use the
aes() function to tell ggplot which variable should be used for the x-coordinate of the points and which variable should be used for the y-coordinate of the points…
How has life expectancy changed over time?
geom_jitter() to deal with over-plotting.
How is life expectancy related to GDP per capita?
What does the distribution of life expectancy look like?
Try plotting a histogram of GDP per capita,
How has life expectancy changed in Afghanistan over time?
How has GDP per capita changed in Afghanistan over time?
How has life expectancy changed in each country over time?
Do these time trends differ between continents?
Do the trends for GDP per capita look similar?
Does the relationship between life expectancy and time look linear? The first layer in this graph shows all the data points, the second layer shows a smoothed trend line and confidence interval.
If you change the smoothing method from
lm, does the linear model look reasonable?
What does the joint distribution of GDP per capita and life expectancy look like?
Does this relationship change over time?
There are a couple ways to save figures made with ggplot2.
The easiest way to save a graph is to export directly from the RStudio
Plots panel, by clicking on
Export when the image is plotted. This will give you the option of
ggsave() function from the ggplot2 library is the way to go if you want to create files programmatically.
Ggplot2 Cheat Sheet R
ggsave() to create a file named
gdpPercap.png that contains a bar chart of GDP per capita.
A Final Example
Below is an example to show off a bunch of ggplot2’s features at once. Notice that different layers in a plot can use different data (the text labels are created using a smaller set of data).