Key Logs (Individual Keys)

Let’s read the keylog data into R again.

kl <- read_csv("../data/keylog.csv")
kl$key[is.na(kl$key)] <- " "
kl <- filter(kl, !is.na(gap1), !is.na(gap2))
kl

## # A tibble: 19,859 × 7
##    id     task  time duration key    gap1  gap2
##    <chr> <dbl> <dbl>    <dbl> <chr> <dbl> <dbl>
##  1 A         1   226       96 "F"     359   263
##  2 A         1   585      121 "a"     244   123
##  3 A         1   829       64 "r"     140    76
##  4 A         1   969       73 " "     322   249
##  5 A         1  1291       76 "o"     211   135
##  6 A         1  1502       64 "u"     226   162
##  7 A         1  1728       64 "t"     213   149
##  8 A         1  1941       80 " "     470   390
##  9 A         1  2411       61 "i"     153    92
## 10 A         1  2564       33 "n"     115    82
## # ℹ 19,849 more rows

We start with the same task we looked at last time, the gap between keys when one is the space bar and the when the key is not the space bar. Basically, do we pause more between words than we do between letters?

x <- kl$gap2[kl$task == 1 & kl$key == " " & kl$gap2 < 1000]
y <- kl$gap2[kl$task == 1 & kl$key != "." & kl$gap2 < 1000]

Run a T-test using the function t.test. You can set the variances to be equal (the confidence level is not important)

# Question 01
t.test(x, y, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  x and y
## t = 17.086, df = 15120, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  57.28737 72.13463
## sample estimates:
## mean of x mean of y 
## 141.32749  76.61649

We’ve already done this before, but now we should understand all of the output. Would we reject the null hypothesis that the means of these two gaps are the same?

User Differences

The code below grabs the duration of key presses of the space bar for two different students. Here, student “A” and student “B”:

x <- kl$gap2[kl$id == "A" & kl$key == " " & kl$gap2 < 1000]
y <- kl$gap2[kl$id == "B" & kl$key == " " & kl$gap2 < 1000]

Starting by copying my code above, create two samples of the time that it takes to press the space bar for you and the id of one of your neighbors (ask them their id):

# Question 02
x <- kl$gap2[kl$id == "A" & kl$key == " " & kl$gap2 < 1000]
y <- kl$gap2[kl$id == "B" & kl$key == " " & kl$gap2 < 1000]

Now, run a T-test for the differences in the time that each of you have after hitting the space bar. Who takes longer? Is the result significant at the 0.01 level? What is the p-value?

t.test(x, y, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  x and y
## t = 8.7324, df = 368, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  132.0742 208.8456
## sample estimates:
## mean of x mean of y 
##  270.4737  100.0138

Now, using the same data, compute a test of whether each of your gaps after hitting the space bar have the same variance.

# Question 03
var.test(x, y)

## 
##  F test to compare two variances
## 
## data:  x and y
## F = 1.6677, num df = 151, denom df = 217, p-value = 0.0005588
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  1.246912 2.248102
## sample estimates:
## ratio of variances 
##           1.667682

Who has more variability? Is this ratio between the variances significant?

Words

Below, we will read in a different version of the dataset that has keylog information about individual words.

word <- read_csv("../data/keylog_word.csv")
word

## # A tibble: 3,329 × 11
##    id     task word_id word  nchar start   end duration gap_before gap_after
##    <chr> <dbl>   <dbl> <chr> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
##  1 A         1       1 out       3  1291  1728      437        462       683
##  2 A         1       2 in        2  2411  2564      153        683       286
##  3 A         1       3 the       3  2850  3141      291        286       423
##  4 A         1       4 unch…    10  3564  6074     2510        423       371
##  5 A         1       5 back…    10  6445  7970     1525        371       305
##  6 A         1       6 of        2  8275  8436      161        305       266
##  7 A         1       7 the       3  8702  8993      291        266       676
##  8 A         1       8 nunf…    14  9669 15103     5434        676       717
##  9 A         1       9 end       3 15820 16029      209        717       319
## 10 A         1      10 of        2 16348 16479      131        319       255
## # ℹ 3,319 more rows
## # ℹ 1 more variable: type <chr>

Below, we create new sets of data that correspond to the gap in milliseconds between words when one word ends a sentence and when it does not.

x <- word$gap_after[word$task == 1 & stri_detect(word$word, fixed = ".")  & word$gap_after < 2000]
y <- word$gap_after[word$task == 1 & !stri_detect(word$word, fixed = ".") & word$gap_after < 2000]

Run a T-test to see whether there is a difference in the pause at the end of a sentence and the gap between words.

t.test(x, y, var.equal = TRUE)

## 
##  Two Sample t-test
## 
## data:  x and y
## t = 2.415, df = 2114, p-value = 0.01582
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##   22.23627 214.37631
## sample estimates:
## mean of x mean of y 
##  530.6591  412.3528

We can look at the words data more next time.

Notebook 06

Key Logs (Individual Keys)

User Differences

Words