Getting Started

Before running this notebook, select “Session > Restart R and Clear Output” in the menu above to start a new R session. This will clear any old data sets and give us a blank slate to start with.

After starting a new session, run the following code chunk to load the libraries and data that we will be working with today.

Chicago Data

Load the Data

Let’s load the data that we will be looking at through the remainder of the semester:

comarea <- read_sf(file.path("data", "chicago_community_areas.geojson"))
ziparea <- read_sf(file.path("data", "zip_codes.geojson"))
socio <- read_csv(file.path("data", "census_socioeconomic.csv"))
medical <- read_csv(file.path("data", "chicago_medical_examiner_cases.csv.gz"))

This time, we will look into the temporal components of the data and see how they can be integrated into the spatial visualisations.

Time and Datetime objects

What is a datetime object?

The medical examiner data has two fields that describe specific times, the time of the incident and the time of death. We call these datetime objects because they describe a specific time on a specific day.

medical
## # A tibble: 34,566 x 14
##    date_incident_iso   date_death_iso      primary_cause   age gender race 
##    <dttm>              <dttm>              <chr>         <dbl> <chr>  <chr>
##  1 2015-01-01 03:25:00 2015-01-01 03:10:00 HYPERTENSIVE…    50 Male   Black
##  2 2014-12-08 18:37:00 2015-01-01 07:15:00 COMPLICATION…    89 Female White
##  3 2014-12-21 23:13:00 2015-01-01 08:45:00 COMPLICATION…    67 Male   Black
##  4 2015-01-01 09:07:00 2015-01-01 09:20:00 COMPLICATION…    31 Male   White
##  5 2015-01-01 09:54:00 2015-01-01 10:10:00 ASPHYXIATION     25 Male   White
##  6 2015-01-01 10:45:00 2015-01-01 11:19:00 HYPERTENSIVE…    71 Male   White
##  7 2015-01-01 10:43:00 2015-01-01 11:22:00 DIABETIC KET…    61 Female Black
##  8 2015-01-01 12:00:00 2015-01-01 13:18:00 ALCOHOL TOXI…    26 Male   White
##  9 2015-01-01 15:14:00 2015-01-01 15:30:00 ARTERIOSCLER…    56 Male   White
## 10 2015-01-01 15:27:00 2015-01-01 15:45:00 CHRONIC OBST…    81 Female White
## # … with 34,556 more rows, and 8 more variables: latino <lgl>,
## #   gun_related <lgl>, opioid_related <lgl>, cold_related <lgl>,
## #   heat_related <lgl>, lon <dbl>, lat <dbl>, residence_zip <chr>

Notice that these are similar but a bit different than the date objects we had in the previous notes. As with dates, there exist several functions to extract particular components of datetime objects. These include all of the functions for dates, as well as specific ones for time of day:

medical %>%
  select(date_incident_iso) %>%
  mutate(
    year = year(date_incident_iso),
    hour = hour(date_incident_iso),
    minute = minute(date_incident_iso)
  )
## # A tibble: 34,566 x 4
##    date_incident_iso    year  hour minute
##    <dttm>              <dbl> <int>  <int>
##  1 2015-01-01 03:25:00  2015     3     25
##  2 2014-12-08 18:37:00  2014    18     37
##  3 2014-12-21 23:13:00  2014    23     13
##  4 2015-01-01 09:07:00  2015     9      7
##  5 2015-01-01 09:54:00  2015     9     54
##  6 2015-01-01 10:45:00  2015    10     45
##  7 2015-01-01 10:43:00  2015    10     43
##  8 2015-01-01 12:00:00  2015    12      0
##  9 2015-01-01 15:14:00  2015    15     14
## 10 2015-01-01 15:27:00  2015    15     27
## # … with 34,556 more rows

Sometimes datetime data is given overly precisely, making it hard to group and count the data. A helpful function to working with such data is floor_date, which rounds down to the nearest interval:

medical %>%
  select(date_death_iso) %>%
  mutate(hour_death_iso = floor_date(date_death_iso, "hour"))
## # A tibble: 34,566 x 2
##    date_death_iso      hour_death_iso     
##    <dttm>              <dttm>             
##  1 2015-01-01 03:10:00 2015-01-01 03:00:00
##  2 2015-01-01 07:15:00 2015-01-01 07:00:00
##  3 2015-01-01 08:45:00 2015-01-01 08:00:00
##  4 2015-01-01 09:20:00 2015-01-01 09:00:00
##  5 2015-01-01 10:10:00 2015-01-01 10:00:00
##  6 2015-01-01 11:19:00 2015-01-01 11:00:00
##  7 2015-01-01 11:22:00 2015-01-01 11:00:00
##  8 2015-01-01 13:18:00 2015-01-01 13:00:00
##  9 2015-01-01 15:30:00 2015-01-01 15:00:00
## 10 2015-01-01 15:45:00 2015-01-01 15:00:00
## # … with 34,556 more rows

You can use different intervals—such as “minute”, “day”, “week”, or “2 hour”—depending on your specific application.

Creating datetime objects

As with date objects, it can be useful to create datetime values on the fly. This can be done by specifying individual components with make_datetime (defaults go to 1970-01-01 00:00):

medical %>%
  arrange(date_incident_iso) %>%
  filter(date_incident_iso > make_datetime(2020, 3, 1, 5, 6))
## # A tibble: 7,477 x 14
##    date_incident_iso   date_death_iso      primary_cause   age gender race 
##    <dttm>              <dttm>              <chr>         <dbl> <chr>  <chr>
##  1 2020-03-01 06:36:00 2020-03-01 07:04:00 MULTIPLE GUN…    28 Male   Black
##  2 2020-03-01 06:36:00 2020-03-01 08:03:00 MULTIPLE GUN…    34 Female Black
##  3 2020-03-01 08:44:00 2020-03-01 08:44:00 CHRONIC ETHA…    43 Male   White
##  4 2020-03-01 09:00:00 2020-03-01 09:52:00 COMBINED DRU…    37 Male   Black
##  5 2020-03-01 09:30:00 2020-03-04 17:52:00 UNDETERMINED     70 Female White
##  6 2020-03-01 10:43:00 2020-03-01 11:10:00 HYPERTENSIVE…    40 Female Black
##  7 2020-03-01 12:00:00 2020-03-01 12:33:00 COMBINED COC…    57 Male   Black
##  8 2020-03-01 13:30:00 2020-03-01 13:56:00 ORGANIC CARD…    69 Female White
##  9 2020-03-01 15:15:00 2020-03-01 15:56:00 COMBINED DRU…    52 Female White
## 10 2020-03-01 15:39:00 2020-03-02 01:21:00 HYPERTENSIVE…    63 Male   Black
## # … with 7,467 more rows, and 8 more variables: latino <lgl>,
## #   gun_related <lgl>, opioid_related <lgl>, cold_related <lgl>,
## #   heat_related <lgl>, lon <dbl>, lat <dbl>, residence_zip <chr>

Or specifying the entire value as a string:

medical %>%
  arrange(date_incident_iso) %>%
  filter(date_incident_iso > ymd_hms("2020-03-01 05:06:00"))
## # A tibble: 7,477 x 14
##    date_incident_iso   date_death_iso      primary_cause   age gender race 
##    <dttm>              <dttm>              <chr>         <dbl> <chr>  <chr>
##  1 2020-03-01 06:36:00 2020-03-01 07:04:00 MULTIPLE GUN…    28 Male   Black
##  2 2020-03-01 06:36:00 2020-03-01 08:03:00 MULTIPLE GUN…    34 Female Black
##  3 2020-03-01 08:44:00 2020-03-01 08:44:00 CHRONIC ETHA…    43 Male   White
##  4 2020-03-01 09:00:00 2020-03-01 09:52:00 COMBINED DRU…    37 Male   Black
##  5 2020-03-01 09:30:00 2020-03-04 17:52:00 UNDETERMINED     70 Female White
##  6 2020-03-01 10:43:00 2020-03-01 11:10:00 HYPERTENSIVE…    40 Female Black
##  7 2020-03-01 12:00:00 2020-03-01 12:33:00 COMBINED COC…    57 Male   Black
##  8 2020-03-01 13:30:00 2020-03-01 13:56:00 ORGANIC CARD…    69 Female White
##  9 2020-03-01 15:15:00 2020-03-01 15:56:00 COMBINED DRU…    52 Female White
## 10 2020-03-01 15:39:00 2020-03-02 01:21:00 HYPERTENSIVE…    63 Male   Black
## # … with 7,467 more rows, and 8 more variables: latino <lgl>,
## #   gun_related <lgl>, opioid_related <lgl>, cold_related <lgl>,
## #   heat_related <lgl>, lon <dbl>, lat <dbl>, residence_zip <chr>

These can be useful for manipulating existing dates or for filtering data.

Time Differences

Adding (or subtracting) integers from a date object increases (or decreases) the date by the given number of days. A datetime object works similarly, but changes by a number of seconds:

medical %>%
  select(date_incident_iso) %>%
  mutate(date_incident_iso_plus1 = date_incident_iso + 1)
## # A tibble: 34,566 x 2
##    date_incident_iso   date_incident_iso_plus1
##    <dttm>              <dttm>                 
##  1 2015-01-01 03:25:00 2015-01-01 03:25:01    
##  2 2014-12-08 18:37:00 2014-12-08 18:37:01    
##  3 2014-12-21 23:13:00 2014-12-21 23:13:01    
##  4 2015-01-01 09:07:00 2015-01-01 09:07:01    
##  5 2015-01-01 09:54:00 2015-01-01 09:54:01    
##  6 2015-01-01 10:45:00 2015-01-01 10:45:01    
##  7 2015-01-01 10:43:00 2015-01-01 10:43:01    
##  8 2015-01-01 12:00:00 2015-01-01 12:00:01    
##  9 2015-01-01 15:14:00 2015-01-01 15:14:01    
## 10 2015-01-01 15:27:00 2015-01-01 15:27:01    
## # … with 34,556 more rows

We can also take the difference between two datetime objects; converting the result to a number with as.numeric yields the number of seconds between the time time periods. Here is the difference between the incident datetime and the death datetime:

medical %>%
  select(date_incident_iso, date_death_iso) %>%
  mutate(diff = as.numeric(date_death_iso - date_incident_iso))
## # A tibble: 34,566 x 3
##    date_incident_iso   date_death_iso         diff
##    <dttm>              <dttm>                <dbl>
##  1 2015-01-01 03:25:00 2015-01-01 03:10:00    -900
##  2 2014-12-08 18:37:00 2015-01-01 07:15:00 2032680
##  3 2014-12-21 23:13:00 2015-01-01 08:45:00  898320
##  4 2015-01-01 09:07:00 2015-01-01 09:20:00     780
##  5 2015-01-01 09:54:00 2015-01-01 10:10:00     960
##  6 2015-01-01 10:45:00 2015-01-01 11:19:00    2040
##  7 2015-01-01 10:43:00 2015-01-01 11:22:00    2340
##  8 2015-01-01 12:00:00 2015-01-01 13:18:00    4680
##  9 2015-01-01 15:14:00 2015-01-01 15:30:00     960
## 10 2015-01-01 15:27:00 2015-01-01 15:45:00    1080
## # … with 34,556 more rows

Time Objects

Finally, we also have the ability to create a time object. This records a time without a corresponding date. It can be useful for when you want arithmetic to behave properly (that is, wrap around at midnight) or if you want to group/plot datetime objects using only their time component.

The function as_hms will create a time object from a date object:

medical %>%
  select(date_death_iso) %>%
  mutate(time_death = as_hms(date_death_iso))
## # A tibble: 34,566 x 2
##    date_death_iso      time_death
##    <dttm>              <time>    
##  1 2015-01-01 03:10:00 03:10     
##  2 2015-01-01 07:15:00 07:15     
##  3 2015-01-01 08:45:00 08:45     
##  4 2015-01-01 09:20:00 09:20     
##  5 2015-01-01 10:10:00 10:10     
##  6 2015-01-01 11:19:00 11:19     
##  7 2015-01-01 11:22:00 11:22     
##  8 2015-01-01 13:18:00 13:18     
##  9 2015-01-01 15:30:00 15:30     
## 10 2015-01-01 15:45:00 15:45     
## # … with 34,556 more rows

Finally, the function hms can be used to create a time object from scratch.

Application

Let’s see how to put some of these elements together. In the plot below we show the number opioiod related and non-opioid related deaths investigated by the medical examiner’s office by the hour of the day:

medical %>%
  mutate(hour_death_iso = floor_date(date_death_iso, "hour")) %>%
  mutate(hour_death_iso = as_hms(hour_death_iso)) %>%
  group_by(hour_death_iso, opioid_related) %>%
  summarize(sm_count()) %>%
  ggplot(aes(hour_death_iso, count)) +
    geom_point(aes(color = opioid_related)) +
    geom_line(aes(color = opioid_related)) +
    scale_y_continuous(limits = c(0, NA))

Practice

Time to Death

Compute the time between the recorded death and the recorded incident. Order the data with the longest duration at the top. Do you see anything strange? Is there a data issue here?

medical %>%
  mutate(diff_time = as.numeric(date_death_iso - date_incident_iso)) %>%
  arrange(desc(diff_time))
## # A tibble: 34,566 x 15
##    date_incident_iso   date_death_iso      primary_cause   age gender race 
##    <dttm>              <dttm>              <chr>         <dbl> <chr>  <chr>
##  1 1948-05-25 04:00:00 2016-08-01 10:59:00 COMPLICATION…    74 Male   Black
##  2 1973-07-23 00:00:00 2017-07-13 20:04:00 COMPLICATION…    68 Male   White
##  3 1983-02-10 00:00:00 2017-12-22 16:51:00 COMPLICATION…    66 Male   White
##  4 1989-01-01 13:00:00 2019-07-31 15:06:00 COMPLICATION…    48 Male   White
##  5 1989-05-05 00:00:00 2019-09-11 15:55:00 COMPLICATION…    59 Female Black
##  6 1985-01-28 00:00:00 2015-05-30 01:50:00 COMPLICATION…    32 Female Black
##  7 1988-07-14 15:45:00 2017-09-05 18:53:00 COMPLICATION…    66 Male   White
##  8 1986-12-27 01:30:00 2015-08-12 18:19:00 COMPLICATION…    76 Male   White
##  9 1992-02-01 00:00:00 2020-06-29 11:34:00 ASPIRATION P…    58 Male   White
## 10 1991-05-10 18:00:00 2017-03-30 11:58:00 ATHEROSCLERO…    40 Male   Black
## # … with 34,556 more rows, and 9 more variables: latino <lgl>,
## #   gun_related <lgl>, opioid_related <lgl>, cold_related <lgl>,
## #   heat_related <lgl>, lon <dbl>, lat <dbl>, residence_zip <chr>,
## #   diff_time <dbl>

I want you to investiage the median duration between the time of death and the time of the incident within the top-100 most common primary_cause labels. Create a table with the median time for each of the top-100 categories and arrange the table from the shortest to the longest duration

medical %>%
  mutate(diff_time = as_hms(as.numeric(date_death_iso - date_incident_iso))) %>%
  group_by(primary_cause) %>%
  summarize(sm_count(), sm_median(diff_time)) %>%
  arrange(desc(count)) %>%
  slice(1:100) %>%
  select(primary_cause, diff_time_median) %>%
  arrange(diff_time_median)
## # A tibble: 100 x 2
##    primary_cause                   diff_time_median
##    <chr>                           <time>          
##  1 CHRONIC ALCOHOLISM              17'00"          
##  2 DILATED CARDIOMYOPATHY          19'30"          
##  3 HANGING                         20'00"          
##  4 ORGANIC  CARDIOVASCULAR DISEASE 20'00"          
##  5 HEROIN AND FENTANYL TOXICITY    20'00"          
##  6 MULTIPLE STAB WOUNDS            21'00"          
##  7 CHRONIC ETHANOLISM              21'30"          
##  8 INTRAORAL GUNSHOT WOUND         22'00"          
##  9 ORGANIC CARDIOVASCUALR DISEASE. 22'30"          
## 10 ORGANIC CARDIOVASCULAR DISEASE  23'00"          
## # … with 90 more rows

Note: If you have the difference in time in seconds, try using the function as_hms to convert these integers into hours, minutes, and seconds. Do the results seem reasonable to you? Any strange outliers?

Heat and Cold Deaths

Create a plot with hour on the x-axis and month on the y-axis. Show a scatter plot with points showing the total number of cold-related deaths that occured (incident time, not time of death) at each combination of hour of the day and month. Try to make the plot look nice (no need for fancy labels, but maybe make sure to include real month names and properly formatted dates).

medical %>%
  filter(cold_related) %>%
  mutate(
    time = as_hms(floor_date(date_incident_iso, "hour")),
    month = month(date_incident_iso, label = TRUE, abbr = FALSE)
  ) %>%
  group_by(month, time) %>%
  summarize(sm_count()) %>%
  ggplot(aes(time, month)) +
    geom_point(aes(size = count), color = "#add8e6") +
    scale_size_area()