Due Date: Noon, 01 October 2020
Total Points 60
This page outlines the instructions for the first project. You should have a file
project01.Rmd in your RStudio Cloud workspace where you can work on the project. Also, you should have been invited to a Richmond Box folder; please upload your work in the “project01” directory within this shared folder. Finally, note that there is a presentation component of the project that will be given in class on the 1 October.
For this project you will be creating a small dataset, producing a single annotated visualization, and telling a written and oral story about the plot. The goal is to illustrate command of the core notebooks (1-8) describing the tools we will be working with during the rest of the semester. The story should help us get to know you a bit better; it does not have to be particularly surprising or insightful regarding the dataset, which will be quite small compared to our other projects.
You have three options for constructing a dataset; select whichever one is the most interesting to you. Regardless of the option chosen, you should have a dataset with at least 20 observations and four variables. Keep in mind the notes from
notebook07 as you structure the dataset.
Television: Create a dataset of your favorite television show, with one observation per episode. Try to pick a show that has an easy to find Wikipedia page for individual episodes. Include as many seasons as needed to get at least 20 rows (though feel free to include more if needed). Record the following information:
Sports: Create a dataset where each row corresponds to one game played by a particular sports team. To get the minimum number of data points, this could be one season of the NFL (with pre-season), two seasons of college football, a long play-off run in the NBA or NHL, or a similar length of time for your favorite sport. Record the following information:
Recipes: Create a dataset of recipes. I suggest grabbing them from a website such as AllRecipes.com, but another site could work as well. Record the following information:
Feel free to add any additional variables that you think will help you tell a story with the dataset.
Once you have the dataset constructed, export as a csv file and read it into R. Spend some time producing an interesting visualization, with a particular focus on telling a story about yourself. This could be your favorite TV episode(s), something you remember from a specific sports game, or a particular memory of cooking a food item. Make sure to add useful labels and follow the guidelines in
notebook08. Save the graphic and add some manual annotations. Then, write a short description (around 250 words) describing the data and telling a story about the plot in your favorite text editor. Finally, construct a data dictionary as an additional table and export this as a csv file as well.
When you are finished, export the essay as a pdf, and upload it along with the annotated graphic (should be one of: pdf, png, jpg), csv file of your data, and the csv file of your data dictionary to box. Be prepared to give a 3 minute presentation about your plot in class on the project’s due date. You do not have to upload your R script.
The project will be graded out of 60 points, according to the following rubric:
You will receive a grade for your work through the shared Box folder. A current participation grade will also be included.
We will be going more in-depth about ways of working with date and date-time data in the coming weeks. Some of you may, however, need to work with date objects in the plot for this project. Let’s say, for example you have recorded the day, month, and year of some events:
## # A tibble: 6 x 4 ## class_day class_month class_year topic ## <dbl> <dbl> <dbl> <chr> ## 1 27 8 2020 Intro ## 2 1 9 2020 Graphics ## 3 3 9 2020 Graphics ## 4 8 9 2020 Database ## 5 10 9 2020 Database ## 6 15 9 2020 Database
By loading the lubridate package, we can use the mutate function to create a new column that stores the full date as it’s own variable like this
library(lubridate) data_science %>% data_science <- mutate(class_date = make_date(class_year, class_month, class_day)) data_science
## # A tibble: 6 x 5 ## class_day class_month class_year topic class_date ## <dbl> <dbl> <dbl> <chr> <date> ## 1 27 8 2020 Intro 2020-08-27 ## 2 1 9 2020 Graphics 2020-09-01 ## 3 3 9 2020 Graphics 2020-09-03 ## 4 8 9 2020 Database 2020-09-08 ## 5 10 9 2020 Database 2020-09-10 ## 6 15 9 2020 Database 2020-09-15
This column can be used within plots, like this:
%>% data_science ggplot(aes(class_date, topic)) + geom_point() + theme_sm()