Deadline: Tuesday, 26 October 2022 at the start of class
This exam is the same format as the notebooks we have been completing in class. In order to complete it correctly, please keep in mind that:
In each section there is exactly one plot or data table that you need to produce. This will often require using multiple data verbs and may also require creating temporary data tables in order to solution the problem. Unless otherwise specified in the question, any valid solution will get full credit.
There are nine questions on this exam. Each is worth 10 points. An additional 10 points is assigned based on your code formatting across the entire exam. This take-home exam will count for half of your Exam 02 grade.
You must Knit the file to an HTML format, print the file, and then bring the exam to class on Wednesday. Some questions require you to build a plot with specific colors; you do not need to print the file in color. I will be able to tell if you did the correct thing based on the code.
You may use any static resources, such as course notes and external websites, but you may not discuss the exam with classmates anyone else.
I am happy to answer clarifying questions about the exam or to help you with unexpected R errors. However, I will not answer content-based questions after the exam is posted. Note that I may not be able to answer questions sent after 8pm on Tuesday night.
The exam should take no more than 3 hours, but you may use as much time as you need.
Personal computer issues is not an excuse for not completing the exam on time. As a back up, you can use the computers in the Jepson computer lab or in the library.
Good luck!
The data for this exam consists of a study looking at birds flying into buildings in downtown Chicago. The data comes from the following paper:
Winger Benjamin M., Weeks Brian C., Farnsworth Andrew, Jones Andrew W., Hennen Mary and Willard David E. 2019. “Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Proc. R. Soc. B. 286: 20190364. http://doi.org/10.1098/rspb.2019.0364
The primary data table is called colls
and consists of
one row for each bird that ran into a building. Features give the type
of bird, the date of the collision, the year of the collision, and the
month of the collision as a number. The primary key here is
species
.
source("../funs/funs.R")
<- read_csv("../data/bird_coll_chicago.csv.bz2")
colls colls
## # A tibble: 3,491 × 5
## species genus date year month
## <chr> <chr> <date> <dbl> <dbl>
## 1 P. sandwichensis Passerculus 2000-05-01 2000 5
## 2 P. sandwichensis Passerculus 2000-05-01 2000 5
## 3 P. sandwichensis Passerculus 2000-05-01 2000 5
## 4 P. sandwichensis Passerculus 2000-05-04 2000 5
## 5 P. sandwichensis Passerculus 2000-05-12 2000 5
## 6 P. sandwichensis Passerculus 2000-09-24 2000 9
## 7 P. sandwichensis Passerculus 2000-10-07 2000 10
## 8 P. sandwichensis Passerculus 2000-10-07 2000 10
## 9 P. sandwichensis Passerculus 2000-10-07 2000 10
## 10 P. sandwichensis Passerculus 2000-10-07 2000 10
## # … with 3,481 more rows
We also have a data table describing the different bird species in
the data. This table includes the number of birds estimated to be in the
state of Illinois as well as the common name in English. The primary key
here is species
.
<- read_csv("../data/bird_coll_species.csv.bz2")
birds birds
## # A tibble: 37 × 4
## species genus english_name population_estimate
## <chr> <chr> <chr> <dbl>
## 1 M. crinitus Myiarchus Great Crested Flycatcher 150000
## 2 C. virens Contopus Eastern Wood-Pewee 240000
## 3 E. traillii Empidonax Willow Flycatcher 69000
## 4 E. minimus Empidonax Least Flycatcher 860
## 5 S. phoebe Sayornis Eastern Phoebe 670000
## 6 V. flavifrons Vireo Yellow-throated Vireo 73000
## 7 V. olivaceus Vireo Red-eyed Vireo 140000
## 8 T. aedon Troglodytes House Wren 1400000
## 9 C. platensis Cistothorus Sedge Wren 17000
## 10 C. palustris Cistothorus Marsh Wren 4000
## # … with 27 more rows
We have another table describing the genus of the birds (genus is a
way of grouping together multiple species). The table tells us whether
this group of birds uses “flight calls” for navigation (similar to a
bat’s sonar), the type of habitat it tends to live in, and whether the
bird lives in the lower part or upper part of the forest. The primary
key here is genus
.
<- read_csv("../data/bird_coll_genus.csv.bz2")
genus genus
## # A tibble: 27 × 5
## genus family flight_call habitat stratum
## <chr> <chr> <chr> <chr> <chr>
## 1 Ammodramus Passerellidae yes open lower
## 2 Catharus Turdidae yes forest lower
## 3 Cistothorus Troglodytidae no open lower
## 4 Contopus Tyrannidae no forest upper
## 5 Dumetella Mimidae no edge lower
## 6 Empidonax Tyrannidae no open upper
## 7 Geothlypis Parulidae yes open lower
## 8 Hylocichla Turdidae yes forest lower
## 9 Icterus Icteridae yes edge upper
## 10 Melospiza Passerellidae yes edge lower
## # … with 17 more rows
Finally, we have a measurement describing the level of light
pollution in the city of Chicago on each day in the colls
data. I cannot find the units of measurement for the light score; it
will not be needed in the questions below. The light levels are also
divded into “low” and “high” such that about half of the days are “low”
and half are “high”. The primary key here is date
.
<- read_csv("../data/bird_coll_light.csv.bz2")
light light
## # A tibble: 1,048 × 3
## date light_score light_level
## <date> <dbl> <chr>
## 1 2000-03-06 3 low
## 2 2000-03-08 15 high
## 3 2000-03-10 3 low
## 4 2000-03-31 3 low
## 5 2000-04-02 17 high
## 6 2000-04-14 4 low
## 7 2000-05-01 14 high
## 8 2000-05-03 3 low
## 9 2000-05-04 14 high
## 10 2000-05-05 16 high
## # … with 1,038 more rows
Please let me know if you have any questions about the tables or features.
In the code block below, produce a plot with year on the x-axis and number of bird collisions on the y-axis. Using both a line and a points geometry, show the total number of bird collisions observed in each year of the data.
%>%
colls group_by(year) %>%
summarize(n = n()) %>%
ggplot(aes(year, n)) +
geom_line() +
geom_point()
In the code block below, produce a plot with number of collisions on
the x-axis and bird species on the y-axis. This should be a bar plot
that shows the total number of collisions by each species. Order the
species in either descending or ascending order (your choice) of the
number of collisions. In place of the scientific species names used in
the colls
data, use the English names found in the
birds
data for the y-axis.
%>%
colls group_by(species) %>%
summarise(n = n()) %>%
left_join(birds, by = "species") %>%
arrange(n) %>%
mutate(english_name = fct_inorder(english_name)) %>%
ggplot(aes(n, english_name)) +
geom_col()
In the code block below, re-create the plot you made for question 2,
but now make the color of the bars (by color, I mean the
fill
aesthetic) correspond to the flight_call
variable describing each type of bird’s flight call pattern as found in
the genus
data. Use a color-blind friend set of colors.
%>%
colls group_by(species) %>%
summarise(n = n()) %>%
left_join(birds, by = "species") %>%
left_join(genus, by = "genus") %>%
arrange(n) %>%
mutate(english_name = fct_inorder(english_name)) %>%
ggplot(aes(n, english_name)) +
geom_col(aes(fill = flight_call)) +
scale_fill_viridis_d()
In the code block below, re-create the plot you made for question 3, but now add appropriate labels for the x-axis, y-axis, and the color/fill legend. Also add a title, subtitle, and caption to the plot. Do not stress about the specific labels; any roughly appropriate choice will get full credit.
Note: this the only question that requires plot labels.
%>%
colls group_by(species) %>%
summarise(n = n()) %>%
left_join(birds, by = "species") %>%
left_join(genus, by = "genus") %>%
arrange(n) %>%
mutate(english_name = fct_inorder(english_name)) %>%
ggplot(aes(n, english_name)) +
geom_col(aes(fill = flight_call)) +
scale_fill_viridis_d() +
labs(x = "Number of Collisions (2000-2016)",
y = "Bird Species",
fill = "Flight Call",
title = "Bird Building Collisions",
subtitle = "A Study from Chicago, IL",
caption = "http://doi.org/10.1098/rspb.2019.0364")
In the code block below, produce a scatter plot where each point represents a bird species. The plot should have the number of times a species collided with a building between months 1-6 (the spring migration; January through June) on the x-axis and the number of times they collided with a building between months 7-12 (the fall migration; July through December) on the y-axis. Filter to only include those species with at least 20 collisions during the spring migration (months 1-6) and include a text repel layer to show the (scientific) names of the species.
Hint: The best way to do this requires a pivot function. However, full credit will be given for any valid solution.
%>%
colls mutate(season = if_else(month < 7, "spring", "fall")) %>%
group_by(season, species) %>%
summarize(n = n()) %>%
pivot_wider(names_from = season, values_from = n, values_fill = 0) %>%
filter(spring >= 20) %>%
ggplot(aes(spring, fall)) +
geom_point() +
geom_text_repel(aes(label = species))
In the code block below, add a linear regression line to the plot you made in question 5 predicting the total number of collisions by species in the fall migrations using the number of collisions in the spring migrations.
You must use the lm
function and
augment
function (in other words, not
geom_smooth
) to receive full credit for this question.
<- colls %>%
temp mutate(season = if_else(month < 7, "spring", "fall")) %>%
group_by(season, species) %>%
summarize(n = n()) %>%
pivot_wider(names_from = season, values_from = n, values_fill = 0) %>%
filter(spring >= 20)
<- lm(fall ~ spring, data = temp)
model
%>%
model augment(newdata = temp) %>%
ggplot(aes(spring, fall)) +
geom_point() +
geom_text_repel(aes(label = species)) +
geom_line(aes(spring, .fitted), color = "red", linetype = "dashed")
In the code block below, produce a plot with the
light_score
on the x-axis and the number of bird collisions
on the y-axis. Draw a geom points layer with one point for each light
score showing the total number of bird collisions for a given amount of
light.
%>%
colls left_join(light, by = "date") %>%
group_by(light_score) %>%
summarize(n = n()) %>%
ggplot(aes(light_score, n)) +
geom_point()
In this question, you’ll produce a table with three rows, one for
each unique value of the variable flight_call
. We want to
compute the proportion of bird collisions per 1 million birds in the
wild population based on the flight call (in other words, does this
species of bird use sound as a navigation tool at night). Specifically,
for each call type we want to compute the following:
rate = (num. of collisions with birds of this flight call * 1000000) / (num. of birds in the wild with this flight call)
Hint: The easiest way to do this is to create two temporary tables,
one with the number of collisions by flight call type and another with
the number of birds in the wild (from the birds
data table)
by flight call pattern. Then, combine the tables with a join and compute
the rate.
<- colls %>%
tab1 left_join(genus, by = "genus") %>%
group_by(flight_call) %>%
summarize(n = n())
<- birds %>%
tab2 left_join(genus, by = c("genus")) %>%
group_by(flight_call) %>%
summarize(num = sum(population_estimate))
%>%
tab1 left_join(tab2, by = "flight_call") %>%
mutate(rate = n / num * 1000000)
## # A tibble: 3 × 4
## flight_call n num rate
## <chr> <int> <dbl> <dbl>
## 1 no 200 3763860 53.1
## 2 rare 14 390000 35.9
## 3 yes 3277 15139030 216.
In this final question, we want to do something similar to question
8, but this time produce a table with six rows, one for each combination
of flight call and the variable called light_level
(which
is either “low” or “high”). The rate then becomes, for each group of
flight calls and light level:
rate = (num. of collisions with birds of this flight call under this light condition * 1000000) / (num. of birds in the wild with this flight call)
Story: You should be able to replicate the findings of the paper, which showed that the highest rate of collisions with buildings occurs with birds that use flight calls for navigation under situations with a high amount of light pollution.
<- colls %>%
tab1 left_join(genus, by = "genus") %>%
left_join(light, by = "date") %>%
group_by(flight_call, light_level) %>%
summarize(n = n())
<- birds %>%
tab2 left_join(genus, by = c("genus")) %>%
group_by(flight_call) %>%
summarize(num = sum(population_estimate))
%>%
tab1 left_join(tab2, by = "flight_call") %>%
mutate(rate = n / num * 1000000)
## # A tibble: 6 × 5
## # Groups: flight_call [3]
## flight_call light_level n num rate
## <chr> <chr> <int> <dbl> <dbl>
## 1 no high 108 3763860 28.7
## 2 no low 94 3763860 25.0
## 3 rare high 13 390000 33.3
## 4 rare low 1 390000 2.56
## 5 yes high 2139 15139030 141.
## 6 yes low 1143 15139030 75.5