Deadline: Wednesday, 16 November 2022 at the start of class
This exam is the same format as the notebooks we have been completing in class. In order to complete it correctly, please keep in mind that:
In each section there is exactly one plot or data table that you need to produce. This will often require using multiple data verbs and may also require creating temporary data tables in order to solution the problem. Unless otherwise specified in the question, any valid solution will get full credit.
There are nine questions on this exam. Each is worth 10 points. An additional 10 points is assigned based on your code formatting across the entire exam. This take-home exam will count for half of your Exam 02 grade.
You must Knit the file to an HTML format, print the file, and then bring the exam to class on Wednesday. Some questions require you to build a plot with specific colors; you do not need to print the file in color. I will be able to tell if you did the correct thing based on the code.
You may use any static resources, such as course notes and external websites, but you may not discuss the exam with classmates anyone else.
I am happy to answer clarifying questions about the exam or to help you with unexpected R errors. However, I will not answer content-based questions after the exam is posted. Note that I may not be able to answer questions sent after 8pm on Tuesday night.
The exam should take no more than 3 hours, but you may use as much time as you need.
Personal computer issues is not an excuse for not completing the exam on time. As a back up, you can use the computers in the Jepson computer lab or in the library.
Good luck!
The data for this exam consists of a subset of bird observations from France between 2010 and 2019. The data comes from https://doi.org/10.15468/dl.bwuax7. I have taken a random subsample that takes a random sample of 100k sightings from the 100 most common species of bird.
Below, is the main dataset, with one row per observation of a bird. It includes the time, location, and scientific name of the bird. I will add a geospatial attribute for you to the dataset as it will be needed in the questions below.
source("../funs/funs.R")
<- read_csv("../data/french_birds.csv.bz2")
birds <- st_as_sf(birds, coords = c("lon", "lat"), crs = 4326, remove = FALSE)
birds birds
## Simple feature collection with 100000 features and 4 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -5.1342 ymin: 41.43501 xmax: 9.55571 ymax: 51.0729
## Geodetic CRS: WGS 84
## # A tibble: 100,000 × 5
## date lon lat scientific_name geometry
## * <date> <dbl> <dbl> <chr> <POINT [°]>
## 1 2010-01-13 2.11 49.3 Corvus frugilegus Linn… (2.10596 49.33126)
## 2 2010-01-13 1.32 43.6 Emberiza schoeniclus (… (1.32246 43.64879)
## 3 2010-01-13 -3.39 47.7 Gallinago gallinago (L… (-3.39164 47.71596)
## 4 2010-01-13 6.39 44.5 Motacilla alba Linnaeu… (6.39454 44.52672)
## 5 2010-01-13 2.37 49.9 Prunella modularis (Li… (2.3745 49.87215)
## 6 2010-01-13 2.11 49.3 Pyrrhula pyrrhula (Lin… (2.10596 49.33126)
## 7 2010-01-13 1.61 50.5 Turdus pilaris Linnaeu… (1.60765 50.45283)
## 8 2010-01-13 -0.944 44.7 Vanellus vanellus (Lin… (-0.94404 44.71436)
## 9 2010-01-14 1.95 49.0 Carduelis chloris (Lin… (1.94986 49.0017)
## 10 2010-01-14 6.82 47.7 Certhia brachydactyla … (6.82303 47.74099)
## # … with 99,990 more rows
We also have a metadata table describing each of the bird species.
You can join to the birds
data using the key
“scientific_name”. Most important for the exam questions are the columns
species_common
(a common name for this bird species) and
order_common
(a common name for a larger grouping of birds
that this species is a part of).
<- read_csv("../data/french_bird_species.csv.bz2")
species species
## # A tibble: 100 × 11
## scienti…¹ speci…² order…³ kingdom phylum class order family genus species
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Accipite… Eurasi… Hawk Animal… Chord… Aves Acci… Accip… Acci… Accipi…
## 2 Acroceph… Sedge … Sparrow Animal… Chord… Aves Pass… Acroc… Acro… Acroce…
## 3 Acroceph… Common… Sparrow Animal… Chord… Aves Pass… Acroc… Acro… Acroce…
## 4 Aegithal… Long-t… Sparrow Animal… Chord… Aves Pass… Aegit… Aegi… Aegith…
## 5 Alauda a… Eurasi… Sparrow Animal… Chord… Aves Pass… Alaud… Alau… Alauda…
## 6 Alcedo a… Common… Kingfi… Animal… Chord… Aves Cora… Alced… Alce… Alcedo…
## 7 Anas cre… Eurasi… Waterf… Animal… Chord… Aves Anse… Anati… Anas Anas c…
## 8 Anas pla… Mallard Waterf… Animal… Chord… Aves Anse… Anati… Anas Anas p…
## 9 Anthus p… Meadow… Sparrow Animal… Chord… Aves Pass… Motac… Anth… Anthus…
## 10 Anthus t… Tree p… Sparrow Animal… Chord… Aves Pass… Motac… Anth… Anthus…
## # … with 90 more rows, 1 more variable: iucn <chr>, and abbreviated
## # variable names ¹scientific_name, ²species_common, ³order_common
We also have a spatial polygon dataset describing the 96 French Départements (roughly the equivalent of a state/county). Below, I am taking a subset to only include those regions in France within Europe, which is the extent of the birds data from above.
<- read_sf("../data/france_departement.geojson")
france <- slice(france, 1:96)
france france
## Simple feature collection with 96 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -5.14026 ymin: 41.33363 xmax: 9.55996 ymax: 51.089
## Geodetic CRS: WGS 84
## # A tibble: 96 × 3
## departement departement_name geometry
## <chr> <chr> <MULTIPOLYGON [°]>
## 1 01 Ain (((4.78021 46.17668, 4.78024 46.1890…
## 2 02 Aisne (((3.17296 50.01131, 3.17382 50.0118…
## 3 03 Allier (((3.03207 46.79491, 3.03424 46.7908…
## 4 04 Alpes-de-Haute-Provence (((5.67604 44.19143, 5.67817 44.1905…
## 5 05 Hautes-Alpes (((6.26057 45.12685, 6.26417 45.1264…
## 6 06 Alpes-Maritimes (((7.06711 43.51365, 7.06665 43.5148…
## 7 07 Ardèche (((4.48313 45.23645, 4.4879 45.23218…
## 8 08 Ardennes (((4.23316 49.95775, 4.2369 49.95858…
## 9 09 Ariège (((1.68842 43.27355, 1.69139 43.2717…
## 10 10 Aube (((3.41479 48.39027, 3.41555 48.3937…
## # … with 86 more rows
Finally, we also have a dataset describing the population of each departement.
<- read_csv("../data/france_departement_population.csv")
population <- slice(population, 1:96)
population population
## # A tibble: 96 × 2
## departement population
## <chr> <dbl>
## 1 01 643350
## 2 02 534490
## 3 03 337988
## 4 04 163915
## 5 05 141284
## 6 06 1083310
## 7 07 325712
## 8 08 273579
## 9 09 153153
## 10 10 310020
## # … with 86 more rows
For the first question, produce a bar plot with day of the week on the x-axis and a count of the number of bird sightings in our dataset on the y-axis. Use full names for the days of the week.
%>%
birds mutate(weekday = wday(date, label = TRUE, abbr = FALSE)) %>%
group_by(weekday) %>%
summarize(n = n()) %>%
ggplot(aes(weekday, n)) +
geom_col()
Create a scatterplot with one point for each day of the year in 2015. On the x-axis put the date and on the y-axis put the number of observations of birds found on that day in 2015. Have the x-axis use the abbreviated month names with breaks and minor breaks given once per month.
Hint: Use the formatting code “%b”.
%>%
birds filter(year(date) == 2015) %>%
group_by(date) %>%
summarize(n = n()) %>%
ggplot(aes(date, n)) +
geom_point() +
scale_x_date(
date_breaks = "month",
date_minor_breaks = "month",
date_labels = "%b"
)
Create a bar plot with common order names on the y-axis and a count of the number of birds across the whole dataset from each order. Arrange the y-axis categories in descending or ascending order (your choice).
%>%
birds inner_join(species, by = "scientific_name") %>%
group_by(order_common) %>%
summarize(n = n()) %>%
ungroup() %>%
arrange(n) %>%
mutate(order_common = fct_inorder(order_common)) %>%
ggplot(aes(n, order_common)) +
geom_col()
For this question, plot the spatial locations of the
birds
data using the CRS projection 27561. Set the size of
the points to 0.3 to make the plot easier to read.
Note: Feel free to change the figure height to make the plot look best on your screen.
%>%
birds st_transform(27561) %>%
ggplot() +
geom_sf(size = 0.3)
For this question, plot the spatial dataset of French departements
from the dataset france
, coloring the points based on their
population and using the CRS projection 27561. Use the following color
scale in the plot:
scale_fill_distiller( trans = “log2”, palette = “Spectral”, guide = “legend”, n.breaks = 5 )
%>%
france left_join(population, by = "departement") %>%
st_transform(27561) %>%
ggplot() +
geom_sf(aes(fill = population), size = 0) +
scale_fill_distiller(
trans = "log2", palette = "Spectral", guide = "legend", n.breaks = 5
)
In this question, do a spatial join of birds
into
france
. Then, for each bird species, compute the proportion
of total sightings that occurred where departement == "75"
(that’s the code for Paris). Order the results in descending order and
display the common species name for each species to show those birds
most commonly associated with Paris.
If your code is running very slow, note that you can (optionally) use
as_tibble()
after the spatial join to speed things up.
Hint: You should find that the most strongly associated bird with Paris is the “Rock dove”. It sounds beautiful, but if you do an image search online you’ll see that this is the nasty pigeon-type bird that have invaded large cities all over the world.
%>%
birds spatial_join(france) %>%
as_tibble() %>%
left_join(species, by = "scientific_name") %>%
group_by(species_common) %>%
summarize(avg_paris = mean(departement == "75")) %>%
arrange(desc(avg_paris))
## # A tibble: 100 × 2
## species_common avg_paris
## <chr> <dbl>
## 1 Rock dove 0.0502
## 2 House sparrow 0.00791
## 3 Short-toed treecreeper 0.00721
## 4 Black-headed gull 0.00673
## 5 Grey wagtail 0.00670
## 6 Carrion crow 0.00543
## 7 Eurasian sparrowhawk 0.00540
## 8 European greenfinch 0.00532
## 9 Eurasian blue tit 0.00496
## 10 Mallard 0.00389
## # … with 90 more rows
For this question, apply a spatial join of birds
into
france
. For each unique value of
species_common
(from the species
data),
compute the total number of departements for which each bird was
observed in. Order the table in descending order to see those species
that are most pervasive across the entire country.
%>%
birds spatial_join(france) %>%
as_tibble() %>%
inner_join(species, by = "scientific_name") %>%
group_by(species_common, departement) %>%
summarize(n = n()) %>%
summarize(n_departement = n()) %>%
arrange(desc(n_departement))
## # A tibble: 100 × 2
## species_common n_departement
## <chr> <int>
## 1 Common blackbird 96
## 2 Common chiffchaff 96
## 3 Eurasian blue tit 96
## 4 Eurasian collared dove 96
## 5 Eurasian jay 96
## 6 Eurasian wren 96
## 7 Great tit 96
## 8 Carrion crow 95
## 9 Common chaffinch 95
## 10 Common starling 95
## # … with 90 more rows
Create a dataset called bird_cnt
describing the number
of birds observed in each departement. Make sure you apply the
as_tibble()
function right after the join to remove spatial
information from this intermediate dataset.
Make sure to both save the dataset bird_cnt
and print
out the data in the solutions (just as I did with solution to Question 7
in Notebook 20).
<- birds %>%
bird_cnt spatial_join(france) %>%
as_tibble() %>%
group_by(departement) %>%
summarize(n = n())
bird_cnt
## # A tibble: 96 × 2
## departement n
## <chr> <int>
## 1 01 1070
## 2 02 2075
## 3 03 665
## 4 04 557
## 5 05 1407
## 6 06 1374
## 7 07 712
## 8 08 565
## 9 09 240
## 10 10 861
## # … with 86 more rows
Finally, create a plot of the French departements, using the color of
the departement to show the number of birds that were observed in each
area. Do this by doing an inner join of france
with the
dataset bird_cnt
you created in the previous question. Use
the CRS 27561 projection for the plot and the following fill scale:
scale_fill_distiller(palette = “Spectral”, guide = “legend”, n.breaks = 10)
Note: You should see that the areas with the most sightings are in the popular birding areas of (1) the Camargue Natural Park in the South of France where the Rhône river meets the Mediterranean Sea, (2) the Médoc and Marais Poitevin Natural Park in the West where the Garonne and Sèvre rivers flow into the Atlantic Ocean, and (3) the Gâtinais Park, a large natural forst just to the southeast of Paris.
%>%
france inner_join(bird_cnt, by = "departement") %>%
st_transform(27561) %>%
ggplot() +
geom_sf(aes(fill = n), size = 0) +
scale_fill_distiller(palette = "Spectral", guide = "legend", n.breaks = 10)