- Map variables to graphic aesthetics to control elements such as color, shape, and size.
- Apply scales to modify the colors and plot ranges of a visualization
Let’s again look at a subset of the data that Hans Roslin used in the video I showed on the first day of class.
Last time we saw how to make plots using the grammar of graphics, such as this scatter plot:
We discussed how in the first line the variable
gdp_per_cap is mapped to the x-axis
life_exp is mapped to the y-axis. One powerful feature of the grammar of graphics
is the ability to map variables into other graphical parameters. These are called
“aesthetics” (that is what the
aes() function stands for) and we already saw one example
last time with the
For example, we can change the color of the points to correspond to a variable in the dataset like this:
We can also map a continuous variable to color, though the default scale is not very nice (more on this in a moment).
We could also change the size of the point to match the population. Note that R writes the population key in scientific notation (2.5e+08 is the same as 2.5 time 10 to the power of eight).
Or, finally, we could change both the size and color.
Notice that R takes care of the specific colors and sizes. All we do is indicate which variables are mapped to a given value.
I rarely do this in practice, but it is also possible to map a variable to a shape:
A very powerful feature of the grammar of graphics is the ability to map a variable to
a visual aesthetic such as the x- and y-axes or the color and shape. In some cases,
though, you may just want to change an aesthetic to a fixed value for all points.
This can be done as well by specifying the aesthetic outside of the
function. For example, here I’ll change all of the points to be blue:
R won’t give an error if I put the same code inside of the aes function. Watch this:
What’s happening here?!
You can mix fixed and variable aesthetics in the same plot. For example, here I use color to represent the continent but make all the points larger.
Note that the
aes() part must go first. Just another rule you need to remember.
More plot types
There are some plot types that do not have a specified y-axis. In these cases the
y-axis is determine by an internal model created by the plot. Two types that we
will frequently see are
geom_bar for showing counts of a categorical variable:
geom_histogram to show the distribution of a numeric variable:
Notice that I changed two fixed aesthetics in this second plot (I like my choices better than the default).
As a common trick with bar plots, I often add the layer
coord_flip to make the
bars go left-to-right.
If the categories are long, this makes it easier to read them.
We can control the exact color choosen in the plot using a layer type known as a scale. For example, the color pallet used with the viridis package can be used to change the colors choosen in a plot:
The viridis color pallet is optimized for readability for people who are color blind. It also improves the plot when printed in black and white or projected on a badly tuned projector.
If you would like more references, here is a cheat-sheet and online notes that extend what we have done today:
These cover much more than we have shown today, and you are only responsible for the notes here. However, you may find the exercises and examples useful if this material is new to you.