Exam 03: Takehome Portion

Instructions

Deadline: Wednesday, 16 November 2022 at the start of class

This exam is the same format as the notebooks we have been completing in class. In order to complete it correctly, please keep in mind that:

In each section there is exactly one plot or data table that you need to produce. This will often require using multiple data verbs and may also require creating temporary data tables in order to solution the problem. Unless otherwise specified in the question, any valid solution will get full credit.
There are nine questions on this exam. Each is worth 10 points. An additional 10 points is assigned based on your code formatting across the entire exam. This take-home exam will count for half of your Exam 02 grade.
You must Knit the file to an HTML format, print the file, and then bring the exam to class on Wednesday. Some questions require you to build a plot with specific colors; you do not need to print the file in color. I will be able to tell if you did the correct thing based on the code.
You may use any static resources, such as course notes and external websites, but you may not discuss the exam with classmates anyone else.
I am happy to answer clarifying questions about the exam or to help you with unexpected R errors. However, I will not answer content-based questions after the exam is posted. Note that I may not be able to answer questions sent after 8pm on Tuesday night.
The exam should take no more than 3 hours, but you may use as much time as you need.
Personal computer issues is not an excuse for not completing the exam on time. As a back up, you can use the computers in the Jepson computer lab or in the library.

Good luck!

Bird Observations in France

The data for this exam consists of a subset of bird observations from France between 2010 and 2019. The data comes from https://doi.org/10.15468/dl.bwuax7. I have taken a random subsample that takes a random sample of 100k sightings from the 100 most common species of bird.

Below, is the main dataset, with one row per observation of a bird. It includes the time, location, and scientific name of the bird. I will add a geospatial attribute for you to the dataset as it will be needed in the questions below.

source("../funs/funs.R")

birds <- read_csv("../data/french_birds.csv.bz2")
birds <- st_as_sf(birds, coords = c("lon", "lat"), crs = 4326, remove = FALSE)
birds

## Simple feature collection with 100000 features and 4 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -5.1342 ymin: 41.43501 xmax: 9.55571 ymax: 51.0729
## Geodetic CRS:  WGS 84
## # A tibble: 100,000 × 5
##    date          lon   lat scientific_name                    geometry
##  * <date>      <dbl> <dbl> <chr>                           <POINT [°]>
##  1 2010-01-13  2.11   49.3 Corvus frugilegus Linn…  (2.10596 49.33126)
##  2 2010-01-13  1.32   43.6 Emberiza schoeniclus (…  (1.32246 43.64879)
##  3 2010-01-13 -3.39   47.7 Gallinago gallinago (L… (-3.39164 47.71596)
##  4 2010-01-13  6.39   44.5 Motacilla alba Linnaeu…  (6.39454 44.52672)
##  5 2010-01-13  2.37   49.9 Prunella modularis (Li…   (2.3745 49.87215)
##  6 2010-01-13  2.11   49.3 Pyrrhula pyrrhula (Lin…  (2.10596 49.33126)
##  7 2010-01-13  1.61   50.5 Turdus pilaris Linnaeu…  (1.60765 50.45283)
##  8 2010-01-13 -0.944  44.7 Vanellus vanellus (Lin… (-0.94404 44.71436)
##  9 2010-01-14  1.95   49.0 Carduelis chloris (Lin…   (1.94986 49.0017)
## 10 2010-01-14  6.82   47.7 Certhia brachydactyla …  (6.82303 47.74099)
## # … with 99,990 more rows

We also have a metadata table describing each of the bird species. You can join to the birds data using the key “scientific_name”. Most important for the exam questions are the columns species_common (a common name for this bird species) and order_common (a common name for a larger grouping of birds that this species is a part of).

species <- read_csv("../data/french_bird_species.csv.bz2")
species

## # A tibble: 100 × 11
##    scienti…¹ speci…² order…³ kingdom phylum class order family genus species
##    <chr>     <chr>   <chr>   <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>  
##  1 Accipite… Eurasi… Hawk    Animal… Chord… Aves  Acci… Accip… Acci… Accipi…
##  2 Acroceph… Sedge … Sparrow Animal… Chord… Aves  Pass… Acroc… Acro… Acroce…
##  3 Acroceph… Common… Sparrow Animal… Chord… Aves  Pass… Acroc… Acro… Acroce…
##  4 Aegithal… Long-t… Sparrow Animal… Chord… Aves  Pass… Aegit… Aegi… Aegith…
##  5 Alauda a… Eurasi… Sparrow Animal… Chord… Aves  Pass… Alaud… Alau… Alauda…
##  6 Alcedo a… Common… Kingfi… Animal… Chord… Aves  Cora… Alced… Alce… Alcedo…
##  7 Anas cre… Eurasi… Waterf… Animal… Chord… Aves  Anse… Anati… Anas  Anas c…
##  8 Anas pla… Mallard Waterf… Animal… Chord… Aves  Anse… Anati… Anas  Anas p…
##  9 Anthus p… Meadow… Sparrow Animal… Chord… Aves  Pass… Motac… Anth… Anthus…
## 10 Anthus t… Tree p… Sparrow Animal… Chord… Aves  Pass… Motac… Anth… Anthus…
## # … with 90 more rows, 1 more variable: iucn <chr>, and abbreviated
## #   variable names ¹scientific_name, ²species_common, ³order_common

We also have a spatial polygon dataset describing the 96 French Départements (roughly the equivalent of a state/county). Below, I am taking a subset to only include those regions in France within Europe, which is the extent of the birds data from above.

france <- read_sf("../data/france_departement.geojson")
france <- slice(france, 1:96)
france

## Simple feature collection with 96 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -5.14026 ymin: 41.33363 xmax: 9.55996 ymax: 51.089
## Geodetic CRS:  WGS 84
## # A tibble: 96 × 3
##    departement departement_name                                     geometry
##    <chr>       <chr>                                      <MULTIPOLYGON [°]>
##  1 01          Ain                     (((4.78021 46.17668, 4.78024 46.1890…
##  2 02          Aisne                   (((3.17296 50.01131, 3.17382 50.0118…
##  3 03          Allier                  (((3.03207 46.79491, 3.03424 46.7908…
##  4 04          Alpes-de-Haute-Provence (((5.67604 44.19143, 5.67817 44.1905…
##  5 05          Hautes-Alpes            (((6.26057 45.12685, 6.26417 45.1264…
##  6 06          Alpes-Maritimes         (((7.06711 43.51365, 7.06665 43.5148…
##  7 07          Ardèche                 (((4.48313 45.23645, 4.4879 45.23218…
##  8 08          Ardennes                (((4.23316 49.95775, 4.2369 49.95858…
##  9 09          Ariège                  (((1.68842 43.27355, 1.69139 43.2717…
## 10 10          Aube                    (((3.41479 48.39027, 3.41555 48.3937…
## # … with 86 more rows

Finally, we also have a dataset describing the population of each departement.

population <- read_csv("../data/france_departement_population.csv")
population <- slice(population, 1:96)
population

## # A tibble: 96 × 2
##    departement population
##    <chr>            <dbl>
##  1 01              643350
##  2 02              534490
##  3 03              337988
##  4 04              163915
##  5 05              141284
##  6 06             1083310
##  7 07              325712
##  8 08              273579
##  9 09              153153
## 10 10              310020
## # … with 86 more rows

1. Observations per day of the week [temporal]

For the first question, produce a bar plot with day of the week on the x-axis and a count of the number of bird sightings in our dataset on the y-axis. Use full names for the days of the week.

birds %>%
  mutate(weekday = wday(date, label = TRUE, abbr = FALSE)) %>%
  group_by(weekday) %>%
  summarize(n = n()) %>%
  ggplot(aes(weekday, n)) +
    geom_col()

2. Observations over 2015 [temporal]

Create a scatterplot with one point for each day of the year in 2015. On the x-axis put the date and on the y-axis put the number of observations of birds found on that day in 2015. Have the x-axis use the abbreviated month names with breaks and minor breaks given once per month.

Hint: Use the formatting code “%b”.

birds %>%
  filter(year(date) == 2015) %>%
  group_by(date) %>%
  summarize(n = n()) %>%
  ggplot(aes(date, n)) +
    geom_point() +
    scale_x_date(
      date_breaks = "month",
      date_minor_breaks = "month",
      date_labels = "%b"
    )

3. Observations by order

Create a bar plot with common order names on the y-axis and a count of the number of birds across the whole dataset from each order. Arrange the y-axis categories in descending or ascending order (your choice).

birds %>%
  inner_join(species, by = "scientific_name") %>%
  group_by(order_common) %>%
  summarize(n = n()) %>%
  ungroup() %>%
  arrange(n) %>%
  mutate(order_common = fct_inorder(order_common)) %>%
  ggplot(aes(n, order_common)) +
    geom_col()

4. Spatial distribution of observations [spatial] (easy)

For this question, plot the spatial locations of the birds data using the CRS projection 27561. Set the size of the points to 0.3 to make the plot easier to read.

Note: Feel free to change the figure height to make the plot look best on your screen.

birds %>%
  st_transform(27561) %>%
  ggplot() +
    geom_sf(size = 0.3)

5. Spatial distribution of observations [spatial] (easy)

For this question, plot the spatial dataset of French departements from the dataset france, coloring the points based on their population and using the CRS projection 27561. Use the following color scale in the plot:

scale_fill_distiller( trans = “log2”, palette = “Spectral”, guide = “legend”, n.breaks = 5 )

france %>%
  left_join(population, by = "departement") %>%
  st_transform(27561) %>%
  ggplot() +
    geom_sf(aes(fill = population), size = 0) +
    scale_fill_distiller(
      trans = "log2", palette = "Spectral", guide = "legend", n.breaks = 5
    )

6. Birds in Paris [spatial]

In this question, do a spatial join of birds into france. Then, for each bird species, compute the proportion of total sightings that occurred where departement == "75" (that’s the code for Paris). Order the results in descending order and display the common species name for each species to show those birds most commonly associated with Paris.

If your code is running very slow, note that you can (optionally) use as_tibble() after the spatial join to speed things up.

Hint: You should find that the most strongly associated bird with Paris is the “Rock dove”. It sounds beautiful, but if you do an image search online you’ll see that this is the nasty pigeon-type bird that have invaded large cities all over the world.

birds %>%
  spatial_join(france) %>%
  as_tibble() %>%
  left_join(species, by = "scientific_name") %>%
  group_by(species_common) %>%
  summarize(avg_paris = mean(departement == "75")) %>%
  arrange(desc(avg_paris))

## # A tibble: 100 × 2
##    species_common         avg_paris
##    <chr>                      <dbl>
##  1 Rock dove                0.0502 
##  2 House sparrow            0.00791
##  3 Short-toed treecreeper   0.00721
##  4 Black-headed gull        0.00673
##  5 Grey wagtail             0.00670
##  6 Carrion crow             0.00543
##  7 Eurasian sparrowhawk     0.00540
##  8 European greenfinch      0.00532
##  9 Eurasian blue tit        0.00496
## 10 Mallard                  0.00389
## # … with 90 more rows

7. Distribution of bird sightings [spatial]

For this question, apply a spatial join of birds into france. For each unique value of species_common (from the species data), compute the total number of departements for which each bird was observed in. Order the table in descending order to see those species that are most pervasive across the entire country.

birds %>%
  spatial_join(france) %>%
  as_tibble() %>%
  inner_join(species, by = "scientific_name") %>%
  group_by(species_common, departement) %>%
  summarize(n = n()) %>%
  summarize(n_departement = n()) %>%
  arrange(desc(n_departement))

## # A tibble: 100 × 2
##    species_common         n_departement
##    <chr>                          <int>
##  1 Common blackbird                  96
##  2 Common chiffchaff                 96
##  3 Eurasian blue tit                 96
##  4 Eurasian collared dove            96
##  5 Eurasian jay                      96
##  6 Eurasian wren                     96
##  7 Great tit                         96
##  8 Carrion crow                      95
##  9 Common chaffinch                  95
## 10 Common starling                   95
## # … with 90 more rows

8. Spatial distribution of observations [spatial]

Create a dataset called bird_cnt describing the number of birds observed in each departement. Make sure you apply the as_tibble() function right after the join to remove spatial information from this intermediate dataset.

Make sure to both save the dataset bird_cnt and print out the data in the solutions (just as I did with solution to Question 7 in Notebook 20).

bird_cnt <- birds %>%
  spatial_join(france) %>%
  as_tibble() %>%
  group_by(departement) %>%
  summarize(n = n())

bird_cnt

## # A tibble: 96 × 2
##    departement     n
##    <chr>       <int>
##  1 01           1070
##  2 02           2075
##  3 03            665
##  4 04            557
##  5 05           1407
##  6 06           1374
##  7 07            712
##  8 08            565
##  9 09            240
## 10 10            861
## # … with 86 more rows

9. Spatial distribution of observations, cont. [spatial]

Finally, create a plot of the French departements, using the color of the departement to show the number of birds that were observed in each area. Do this by doing an inner join of france with the dataset bird_cnt you created in the previous question. Use the CRS 27561 projection for the plot and the following fill scale:

scale_fill_distiller(palette = “Spectral”, guide = “legend”, n.breaks = 10)

Note: You should see that the areas with the most sightings are in the popular birding areas of (1) the Camargue Natural Park in the South of France where the Rhône river meets the Mediterranean Sea, (2) the Médoc and Marais Poitevin Natural Park in the West where the Garonne and Sèvre rivers flow into the Atlantic Ocean, and (3) the Gâtinais Park, a large natural forst just to the southeast of Paris.

france %>%
  inner_join(bird_cnt, by = "departement") %>%
  st_transform(27561) %>%
  ggplot() +
    geom_sf(aes(fill = n), size = 0) +
    scale_fill_distiller(palette = "Spectral", guide = "legend", n.breaks = 10)