Let’s read the keylog data into R again.
kl <- read_csv("../data/keylog.csv")
kl$key[is.na(kl$key)] <- " "
kl <- filter(kl, !is.na(gap1), !is.na(gap2))
kl
## # A tibble: 19,859 × 7
## id task time duration key gap1 gap2
## <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 A 1 226 96 "F" 359 263
## 2 A 1 585 121 "a" 244 123
## 3 A 1 829 64 "r" 140 76
## 4 A 1 969 73 " " 322 249
## 5 A 1 1291 76 "o" 211 135
## 6 A 1 1502 64 "u" 226 162
## 7 A 1 1728 64 "t" 213 149
## 8 A 1 1941 80 " " 470 390
## 9 A 1 2411 61 "i" 153 92
## 10 A 1 2564 33 "n" 115 82
## # ℹ 19,849 more rows
We start with the same task we looked at last time, the gap between keys when one is the space bar and the when the key is not the space bar. Basically, do we pause more between words than we do between letters?
x <- kl$gap2[kl$task == 1 & kl$key == " " & kl$gap2 < 1000]
y <- kl$gap2[kl$task == 1 & kl$key != "." & kl$gap2 < 1000]
Run a T-test using the function t.test
. You can set the
variances to be equal (the confidence level is not important)
##
## Two Sample t-test
##
## data: x and y
## t = 17.086, df = 15120, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 57.28737 72.13463
## sample estimates:
## mean of x mean of y
## 141.32749 76.61649
We’ve already done this before, but now we should understand all of the output. Would we reject the null hypothesis that the means of these two gaps are the same?
The code below grabs the duration of key presses of the space bar for two different students. Here, student “A” and student “B”:
x <- kl$gap2[kl$id == "A" & kl$key == " " & kl$gap2 < 1000]
y <- kl$gap2[kl$id == "B" & kl$key == " " & kl$gap2 < 1000]
Starting by copying my code above, create two samples of the time that it takes to press the space bar for you and the id of one of your neighbors (ask them their id):
# Question 02
x <- kl$gap2[kl$id == "A" & kl$key == " " & kl$gap2 < 1000]
y <- kl$gap2[kl$id == "B" & kl$key == " " & kl$gap2 < 1000]
Now, run a T-test for the differences in the time that each of you have after hitting the space bar. Who takes longer? Is the result significant at the 0.01 level? What is the p-value?
##
## Two Sample t-test
##
## data: x and y
## t = 8.7324, df = 368, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 132.0742 208.8456
## sample estimates:
## mean of x mean of y
## 270.4737 100.0138
Now, using the same data, compute a test of whether each of your gaps after hitting the space bar have the same variance.
##
## F test to compare two variances
##
## data: x and y
## F = 1.6677, num df = 151, denom df = 217, p-value = 0.0005588
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 1.246912 2.248102
## sample estimates:
## ratio of variances
## 1.667682
Who has more variability? Is this ratio between the variances significant?
Below, we will read in a different version of the dataset that has keylog information about individual words.
## # A tibble: 3,329 × 11
## id task word_id word nchar start end duration gap_before gap_after
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 A 1 1 out 3 1291 1728 437 462 683
## 2 A 1 2 in 2 2411 2564 153 683 286
## 3 A 1 3 the 3 2850 3141 291 286 423
## 4 A 1 4 unch… 10 3564 6074 2510 423 371
## 5 A 1 5 back… 10 6445 7970 1525 371 305
## 6 A 1 6 of 2 8275 8436 161 305 266
## 7 A 1 7 the 3 8702 8993 291 266 676
## 8 A 1 8 nunf… 14 9669 15103 5434 676 717
## 9 A 1 9 end 3 15820 16029 209 717 319
## 10 A 1 10 of 2 16348 16479 131 319 255
## # ℹ 3,319 more rows
## # ℹ 1 more variable: type <chr>
Below, we create new sets of data that correspond to the gap in milliseconds between words when one word ends a sentence and when it does not.
x <- word$gap_after[word$task == 1 & stri_detect(word$word, fixed = ".") & word$gap_after < 2000]
y <- word$gap_after[word$task == 1 & !stri_detect(word$word, fixed = ".") & word$gap_after < 2000]
Run a T-test to see whether there is a difference in the pause at the end of a sentence and the gap between words.
##
## Two Sample t-test
##
## data: x and y
## t = 2.415, df = 2114, p-value = 0.01582
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 22.23627 214.37631
## sample estimates:
## mean of x mean of y
## 530.6591 412.3528
We can look at the words data more next time.