Instructions

Deadline: Tuesday, 14 September 2022 at the start of class

This exam is the same format as the notebooks we have been completing in class. In order to complete it correctly, please keep in mind that:

  1. In each section there is exactly one plot that you need to create. It will often require using data verbs to create the plot, but it is up to you how to do that.

  2. There are nine questions on this exam. Each is worth 10 points. An additional 10 points is assigned based on your code formatting across the entire exam. This take-home exam will count for half of your Exam 01 grade.

  3. You must Knit the file to an HTML format, print the file, and then bring the exam to class on Wednesday. Some questions require you to build a plot with specific colors; you do not need to print the HTML in color. I will be able to tell if you did the correct thing based on the code.

  4. You may use any static resources, such as course notes and external websites, but you may not discuss the exam with classmates anyone else.

  5. I am happy to answer clarifying questions about the exam or to help you with unexpected R errors. However, I will not answer content-based questions after the exam is posted. Note that I may not be able to answer questions sent after 8pm on Tuesday night.

  6. The exam should take no more than 2 hours, but you may use as much time as you need.

  7. Personal computer issues is not an excuse for not completing the exam on time. As a back up, you can use the computers in the Jepson computer lab or in the library.

Good luck!

Shark Attacks in Australia

The data set for this exam consists of a set of observations from nearly 400 shark attacks on humans that occurred in Australia. You can read the data in with the following:

source("../funs/funs.R")
sharks <- read_csv("../data/shark_attacks.csv")
sharks
## # A tibble: 367 × 12
##     year month outcome     lon   lat shark…¹ shark…² num_s…³ provo…⁴ victi…⁵
##    <dbl> <dbl> <chr>     <dbl> <dbl> <chr>     <dbl>   <dbl> <chr>   <chr>  
##  1  1945     2 injured    151. -34.0 wobbeg…     1.3       1 provok… fishing
##  2  1946     2 injured    116. -31.9 white …     4.2       1 unprov… swimmi…
##  3  1946     8 fatal      146. -16.7 tiger …     4.8       1 unprov… swimmi…
##  4  1947    12 injured    153. -31.7 bull s…     2         1 provok… fishing
##  5  1948     2 fatal      152. -32.8 tiger …     4         1 unprov… swimmi…
##  6  1949     1 uninjured  151. -33.7 white …     3         1 unprov… boardi…
##  7  1949     1 fatal      152. -32.9 white …     4         1 unprov… swimmi…
##  8  1949     4 fatal      146. -16.7 tiger …     3.6       1 unprov… swimmi…
##  9  1949     5 injured    122. -18.0 bull s…     2.7       1 unprov… swimmi…
## 10  1949    11 injured    151. -34   wobbeg…     2         2 provok… swimmi…
## # … with 357 more rows, 2 more variables: age <dbl>, gender <chr>, and
## #   abbreviated variable names ¹​shark_type, ²​shark_length, ³​num_sharks,
## #   ⁴​provoked, ⁵​victim_activity

Here is a data dictionary for the features:

If you have questions about what these features mean, please let me know.

1. Draw a map

In the code block below, create a scatter plot with longitude on the x-axis, latitude on the y-axis, and one point for each shark attack in the data.

sharks %>%
  ggplot(aes(lon, lat)) +
    geom_point()

2. Draw a map, with color

In the code block below, create a scatter plot with longitude on the x-axis, latitude on the y-axis, and one point for each shark attack in the data. Color the points based on the outcome of the attack (fatal, injured, or uninjured). Use the color-blind friendly color scale.

sharks %>%
  ggplot(aes(lon, lat)) +
    geom_point(aes(color = outcome)) +
    scale_color_viridis_d()

3. Types of sharks

In the code block below, create a bar plot with shark type on the y-axis and the number of attacks for a given shark type on the x-axis. Arrange the sharks from the smallest to the largest number of attacks (i.e., the shark with the lowest number of attacks should be in in the upper-left hand corner of the plot).

sharks %>%
  group_by(shark_type) %>%
  summarize(n = n()) %>%
  arrange(desc(n)) %>%
  ggplot(aes(n, fct_inorder(shark_type))) +
    geom_col()

4. Another map, with color

In the code block below, create a scatter plot with longitude on the x-axis, latitude on the y-axis, and one point for each shark attack in the data. Color the points in the light grey color “grey85”, while highlighting the attacks made by the type of shark with the most attacks (at determined in the previous question) in “red”.

# How I expected you to do this
white_shark <- sharks %>%
  filter(shark_type %in% c("white shark"))

sharks %>%
  ggplot(aes(lon, lat)) +
    geom_point(color = "grey85") +
    geom_point(color = "red", data = white_shark)

# Another approach with mutate()
sharks %>%
  mutate(my_color = if_else(shark_type == "white shark", "red", "grey85")) %>%
  arrange(my_color) %>%
  ggplot(aes(lon, lat)) +
    geom_point(aes(color = my_color)) +
    scale_color_identity()

5. Draw a line plot

In the code block below, draw a line plot (geom_line) with year on the x-axis and the number of shark attacks in a given year on the y-axis.

sharks %>%
  group_by(year) %>%
  summarise(n = n()) %>%
  ggplot(aes(year, n)) +
    geom_line()

6. Draw a line plot

In the code block below, draw a line plot (geom_line) with year on the x-axis and the number of shark attacks in a given year on the y-axis just as you did in the previous question. This time, add a red dot showing the year with the greatest number of attacks. The dot should be located at the year with the maximum number of attacks (x-axis) and the maximum number of attacks (y-axis).

most_attacks <- sharks %>%
  group_by(year) %>%
  summarise(n = n()) %>%
  arrange(desc(n)) %>%
  slice(1)

sharks %>%
  group_by(year) %>%
  summarise(n = n()) %>%
  ggplot(aes(year, n)) +
    geom_line() +
    geom_point(data = most_attacks, color = "red")

7. Activites vs. Fatalities

In the code block below, create a scatter plot with the victim activity on the y-axis and the percentage of attacks that were fatal for victims engaged in that activity on the x-axis. Order the activities from the least-likely to be fatal to the most likely to be fatal.

sharks %>%
  group_by(victim_activity) %>%
  summarise(fatal_mean = mean(outcome == "fatal")) %>%
  arrange(desc(fatal_mean)) %>%
  ggplot(aes(fatal_mean, fct_inorder(victim_activity))) +
    geom_point()

8. Shark Type Scatterplot

In the code block below, create a scatterplot where each point corresponds to a type of shark. The x-axis should capture the average length of that shark across all attacks and the y-axis should capture the average age of people attacked by the given type of shark. The size of the points should represent the number of attacks that were made by each type of shark.

sharks %>%
  group_by(shark_type) %>%
  summarise(avg_length = mean(shark_length), avg_age = mean(age), n = n()) %>%
  ggplot(aes(avg_length, avg_age)) +
    geom_point(aes(size = n)) +
    geom_text_repel(aes(label = shark_type))

9. Oldest Victims

In the code block below, create a plot with the year on the x-axis, the age of the victim on the y-axis, but only include the 15 oldest victims. Also include labels showing the activity the victims were engaged in.

sharks %>%
  arrange(desc(age)) %>%
  slice(1:15) %>%
  ggplot(aes(year, age)) +
    geom_point() +
    geom_text_repel(aes(label = victim_activity))