Instructions

Deadline: Tuesday, 26 October 2022 at the start of class

This exam is the same format as the notebooks we have been completing in class. In order to complete it correctly, please keep in mind that:

  1. In each section there is exactly one plot or data table that you need to produce. This will often require using multiple data verbs and may also require creating temporary data tables in order to solution the problem. Unless otherwise specified in the question, any valid solution will get full credit.

  2. There are nine questions on this exam. Each is worth 10 points. An additional 10 points is assigned based on your code formatting across the entire exam. This take-home exam will count for half of your Exam 02 grade.

  3. You must Knit the file to an HTML format, print the file, and then bring the exam to class on Wednesday. Some questions require you to build a plot with specific colors; you do not need to print the file in color. I will be able to tell if you did the correct thing based on the code.

  4. You may use any static resources, such as course notes and external websites, but you may not discuss the exam with classmates anyone else.

  5. I am happy to answer clarifying questions about the exam or to help you with unexpected R errors. However, I will not answer content-based questions after the exam is posted. Note that I may not be able to answer questions sent after 8pm on Tuesday night.

  6. The exam should take no more than 3 hours, but you may use as much time as you need.

  7. Personal computer issues is not an excuse for not completing the exam on time. As a back up, you can use the computers in the Jepson computer lab or in the library.

Good luck!

Bird Building Collisions

The data for this exam consists of a study looking at birds flying into buildings in downtown Chicago. The data comes from the following paper:

Winger Benjamin M., Weeks Brian C., Farnsworth Andrew, Jones Andrew W., Hennen Mary and Willard David E. 2019. “Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Proc. R. Soc. B. 286: 20190364. http://doi.org/10.1098/rspb.2019.0364

The primary data table is called colls and consists of one row for each bird that ran into a building. Features give the type of bird, the date of the collision, the year of the collision, and the month of the collision as a number. The primary key here is species.

source("../funs/funs.R")

colls <- read_csv("../data/bird_coll_chicago.csv.bz2")
colls
## # A tibble: 3,491 × 5
##    species          genus       date        year month
##    <chr>            <chr>       <date>     <dbl> <dbl>
##  1 P. sandwichensis Passerculus 2000-05-01  2000     5
##  2 P. sandwichensis Passerculus 2000-05-01  2000     5
##  3 P. sandwichensis Passerculus 2000-05-01  2000     5
##  4 P. sandwichensis Passerculus 2000-05-04  2000     5
##  5 P. sandwichensis Passerculus 2000-05-12  2000     5
##  6 P. sandwichensis Passerculus 2000-09-24  2000     9
##  7 P. sandwichensis Passerculus 2000-10-07  2000    10
##  8 P. sandwichensis Passerculus 2000-10-07  2000    10
##  9 P. sandwichensis Passerculus 2000-10-07  2000    10
## 10 P. sandwichensis Passerculus 2000-10-07  2000    10
## # … with 3,481 more rows

We also have a data table describing the different bird species in the data. This table includes the number of birds estimated to be in the state of Illinois as well as the common name in English. The primary key here is species.

birds <- read_csv("../data/bird_coll_species.csv.bz2")
birds
## # A tibble: 37 × 4
##    species       genus       english_name             population_estimate
##    <chr>         <chr>       <chr>                                  <dbl>
##  1 M. crinitus   Myiarchus   Great Crested Flycatcher              150000
##  2 C. virens     Contopus    Eastern Wood-Pewee                    240000
##  3 E. traillii   Empidonax   Willow Flycatcher                      69000
##  4 E. minimus    Empidonax   Least Flycatcher                         860
##  5 S. phoebe     Sayornis    Eastern Phoebe                        670000
##  6 V. flavifrons Vireo       Yellow-throated Vireo                  73000
##  7 V. olivaceus  Vireo       Red-eyed Vireo                        140000
##  8 T. aedon      Troglodytes House Wren                           1400000
##  9 C. platensis  Cistothorus Sedge Wren                             17000
## 10 C. palustris  Cistothorus Marsh Wren                              4000
## # … with 27 more rows

We have another table describing the genus of the birds (genus is a way of grouping together multiple species). The table tells us whether this group of birds uses “flight calls” for navigation (similar to a bat’s sonar), the type of habitat it tends to live in, and whether the bird lives in the lower part or upper part of the forest. The primary key here is genus.

genus <- read_csv("../data/bird_coll_genus.csv.bz2")
genus
## # A tibble: 27 × 5
##    genus       family        flight_call habitat stratum
##    <chr>       <chr>         <chr>       <chr>   <chr>  
##  1 Ammodramus  Passerellidae yes         open    lower  
##  2 Catharus    Turdidae      yes         forest  lower  
##  3 Cistothorus Troglodytidae no          open    lower  
##  4 Contopus    Tyrannidae    no          forest  upper  
##  5 Dumetella   Mimidae       no          edge    lower  
##  6 Empidonax   Tyrannidae    no          open    upper  
##  7 Geothlypis  Parulidae     yes         open    lower  
##  8 Hylocichla  Turdidae      yes         forest  lower  
##  9 Icterus     Icteridae     yes         edge    upper  
## 10 Melospiza   Passerellidae yes         edge    lower  
## # … with 17 more rows

Finally, we have a measurement describing the level of light pollution in the city of Chicago on each day in the colls data. I cannot find the units of measurement for the light score; it will not be needed in the questions below. The light levels are also divded into “low” and “high” such that about half of the days are “low” and half are “high”. The primary key here is date.

light <- read_csv("../data/bird_coll_light.csv.bz2")
light
## # A tibble: 1,048 × 3
##    date       light_score light_level
##    <date>           <dbl> <chr>      
##  1 2000-03-06           3 low        
##  2 2000-03-08          15 high       
##  3 2000-03-10           3 low        
##  4 2000-03-31           3 low        
##  5 2000-04-02          17 high       
##  6 2000-04-14           4 low        
##  7 2000-05-01          14 high       
##  8 2000-05-03           3 low        
##  9 2000-05-04          14 high       
## 10 2000-05-05          16 high       
## # … with 1,038 more rows

Please let me know if you have any questions about the tables or features.

1. Collisions over time

In the code block below, produce a plot with year on the x-axis and number of bird collisions on the y-axis. Using both a line and a points geometry, show the total number of bird collisions observed in each year of the data.

colls %>%
  group_by(year) %>%
  summarize(n = n()) %>%
  ggplot(aes(year, n)) +
    geom_line() +
    geom_point()

2. Bird Collisions by Species

In the code block below, produce a plot with number of collisions on the x-axis and bird species on the y-axis. This should be a bar plot that shows the total number of collisions by each species. Order the species in either descending or ascending order (your choice) of the number of collisions. In place of the scientific species names used in the colls data, use the English names found in the birds data for the y-axis.

colls %>%
  group_by(species) %>%
  summarise(n = n()) %>%
  left_join(birds, by = "species") %>%
  arrange(n) %>%
  mutate(english_name = fct_inorder(english_name)) %>%
  ggplot(aes(n, english_name)) +
    geom_col()

3. Species by Flight Call

In the code block below, re-create the plot you made for question 2, but now make the color of the bars (by color, I mean the fill aesthetic) correspond to the flight_call variable describing each type of bird’s flight call pattern as found in the genus data. Use a color-blind friend set of colors.

colls %>%
  group_by(species) %>%
  summarise(n = n()) %>%
  left_join(birds, by = "species") %>%
  left_join(genus, by = "genus") %>%
  arrange(n) %>%
  mutate(english_name = fct_inorder(english_name)) %>%
  ggplot(aes(n, english_name)) +
    geom_col(aes(fill = flight_call)) +
    scale_fill_viridis_d()

4. Species by Flight Call (labels)

In the code block below, re-create the plot you made for question 3, but now add appropriate labels for the x-axis, y-axis, and the color/fill legend. Also add a title, subtitle, and caption to the plot. Do not stress about the specific labels; any roughly appropriate choice will get full credit.

Note: this the only question that requires plot labels.

colls %>%
  group_by(species) %>%
  summarise(n = n()) %>%
  left_join(birds, by = "species") %>%
  left_join(genus, by = "genus") %>%
  arrange(n) %>%
  mutate(english_name = fct_inorder(english_name)) %>%
  ggplot(aes(n, english_name)) +
    geom_col(aes(fill = flight_call)) +
    scale_fill_viridis_d() +
    labs(x = "Number of Collisions (2000-2016)",
         y = "Bird Species",
         fill = "Flight Call",
         title = "Bird Building Collisions",
         subtitle = "A Study from Chicago, IL",
         caption = "http://doi.org/10.1098/rspb.2019.0364")

5. Collisions and Season

In the code block below, produce a scatter plot where each point represents a bird species. The plot should have the number of times a species collided with a building between months 1-6 (the spring migration; January through June) on the x-axis and the number of times they collided with a building between months 7-12 (the fall migration; July through December) on the y-axis. Filter to only include those species with at least 20 collisions during the spring migration (months 1-6) and include a text repel layer to show the (scientific) names of the species.

Hint: The best way to do this requires a pivot function. However, full credit will be given for any valid solution.

colls %>%
  mutate(season = if_else(month < 7, "spring", "fall")) %>%
  group_by(season, species) %>%
  summarize(n = n()) %>%
  pivot_wider(names_from = season, values_from = n, values_fill = 0) %>%
  filter(spring >= 20) %>%
  ggplot(aes(spring, fall)) +
    geom_point() +
    geom_text_repel(aes(label = species))

6. Collisions and Season

In the code block below, add a linear regression line to the plot you made in question 5 predicting the total number of collisions by species in the fall migrations using the number of collisions in the spring migrations.

You must use the lm function and augment function (in other words, not geom_smooth) to receive full credit for this question.

temp <- colls %>%
  mutate(season = if_else(month < 7, "spring", "fall")) %>%
  group_by(season, species) %>%
  summarize(n = n()) %>%
  pivot_wider(names_from = season, values_from = n, values_fill = 0) %>%
  filter(spring >= 20)

model <- lm(fall ~ spring, data = temp)

model %>%
  augment(newdata = temp) %>%
  ggplot(aes(spring, fall)) +
    geom_point() +
    geom_text_repel(aes(label = species)) +
    geom_line(aes(spring, .fitted), color = "red", linetype = "dashed")

7. Light Pollution

In the code block below, produce a plot with the light_score on the x-axis and the number of bird collisions on the y-axis. Draw a geom points layer with one point for each light score showing the total number of bird collisions for a given amount of light.

colls %>%
  left_join(light, by = "date") %>%
  group_by(light_score) %>%
  summarize(n = n()) %>%
  ggplot(aes(light_score, n)) +
    geom_point()

8. Rate of Collision based on Flight Call Type

In this question, you’ll produce a table with three rows, one for each unique value of the variable flight_call. We want to compute the proportion of bird collisions per 1 million birds in the wild population based on the flight call (in other words, does this species of bird use sound as a navigation tool at night). Specifically, for each call type we want to compute the following:

rate = (num. of collisions with birds of this flight call * 1000000) / (num. of birds in the wild with this flight call)

Hint: The easiest way to do this is to create two temporary tables, one with the number of collisions by flight call type and another with the number of birds in the wild (from the birds data table) by flight call pattern. Then, combine the tables with a join and compute the rate.

tab1 <- colls %>%
  left_join(genus, by = "genus") %>%
  group_by(flight_call) %>%
  summarize(n = n())

tab2 <- birds %>%
  left_join(genus, by = c("genus")) %>%
  group_by(flight_call) %>%
  summarize(num = sum(population_estimate))

tab1 %>%
  left_join(tab2, by = "flight_call") %>%
  mutate(rate = n / num * 1000000)
## # A tibble: 3 × 4
##   flight_call     n      num  rate
##   <chr>       <int>    <dbl> <dbl>
## 1 no            200  3763860  53.1
## 2 rare           14   390000  35.9
## 3 yes          3277 15139030 216.

9. Rate of Collision based on Flight Call Type and Light Pattern

In this final question, we want to do something similar to question 8, but this time produce a table with six rows, one for each combination of flight call and the variable called light_level (which is either “low” or “high”). The rate then becomes, for each group of flight calls and light level:

rate = (num. of collisions with birds of this flight call under this light condition * 1000000) / (num. of birds in the wild with this flight call)

Story: You should be able to replicate the findings of the paper, which showed that the highest rate of collisions with buildings occurs with birds that use flight calls for navigation under situations with a high amount of light pollution.

tab1 <- colls %>%
  left_join(genus, by = "genus") %>%
  left_join(light, by = "date") %>%
  group_by(flight_call, light_level) %>%
  summarize(n = n())

tab2 <- birds %>%
  left_join(genus, by = c("genus")) %>%
  group_by(flight_call) %>%
  summarize(num = sum(population_estimate))

tab1 %>%
  left_join(tab2, by = "flight_call") %>%
  mutate(rate = n / num * 1000000)
## # A tibble: 6 × 5
## # Groups:   flight_call [3]
##   flight_call light_level     n      num   rate
##   <chr>       <chr>       <int>    <dbl>  <dbl>
## 1 no          high          108  3763860  28.7 
## 2 no          low            94  3763860  25.0 
## 3 rare        high           13   390000  33.3 
## 4 rare        low             1   390000   2.56
## 5 yes         high         2139 15139030 141.  
## 6 yes         low          1143 15139030  75.5