Due Date: Noon, 10 November 2020 (Tuesday)
Total Points: 50
This page outlines the instructions for the second project. You should have a file
project03.Rmd in your RStudio Cloud workspace where you can work on the project.
To simplify things at this point in the semester, this project is a bit less open-ended. I have provided 5 prompts asking questions of the coronavirus infection datasets from the United States. You need to answer each of the prompts through graphics and tables. You will write your analysis in the RMarkdown file; the only thing to hand-in is your knit html film. There is presentation.
We will be working on this project in class in your groups, but each student will submit their own copy of the assignment. These can range from carbon-copies of everyone in your group to a completely re-done version of the project. If feasible, you’re welcome to work together with your group or a subset of your group outside of class. However, you are not allowed to work directly or share code with students outside of your assigned class group. Asynchronous students should work on and submit the project on their own.
I will be happy to answer general questions about the project in class, and am always happy to better explain the meaning of each question. However, I may avoid directly answering R coding questions where it gives too much of the answer away. There is, of course, never any harm in asking though if you are stuck!
The project will be graded out of 50 points, 10 points for each question prompt. You will be graded on answering the posed question correctly, producing a plot that follows the guidelines for constructing informative data visualizations, and the general principles of showing spatial and temporal data shown in class.
As noted above, please submit your knit Rmd file as an HTML document on Box. Note that you will not be able to properly preview the file on Box, but should be able to view in locally on your machine. Everyone must submit their own copy of the project, even if it is exactly the same as others in your group. You will receive a grade for your work through the shared Box folder. A current participation grade will also be included.
In the interest of fairness, I will summarize some of the major notes and hints that I gave in class for the five questions.
semi_joinfunction is your friend here, but you can also manually filter out fips codes. Do not use the county names to subset the data, because county names are not unique.
sumfunction, rather than
sm_sum. An easy way to create buckets for the densities is with the following:
county %>% county_density <- left_join(demog, by = c("fips", "state")) %>% mutate(area = as.numeric(set_units(st_area(geometry), "km^2"))) %>% mutate(density = population / area) %>% mutate(bucket = cut(log2(density + 1), breaks = 5, labels = FALSE)) %>% as_tibble()
Note that you do not need to the spatial data