Let’s read the word-level keylog data into R.
## # A tibble: 3,329 × 11
## id task word_id word nchar start end duration gap_before gap_after
## <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 A 1 1 out 3 1291 1728 437 462 683
## 2 A 1 2 in 2 2411 2564 153 683 286
## 3 A 1 3 the 3 2850 3141 291 286 423
## 4 A 1 4 unch… 10 3564 6074 2510 423 371
## 5 A 1 5 back… 10 6445 7970 1525 371 305
## 6 A 1 6 of 2 8275 8436 161 305 266
## 7 A 1 7 the 3 8702 8993 291 266 676
## 8 A 1 8 nunf… 14 9669 15103 5434 676 717
## 9 A 1 9 end 3 15820 16029 209 717 319
## 10 A 1 10 of 2 16348 16479 131 319 255
## # ℹ 3,319 more rows
## # ℹ 1 more variable: type <chr>
We are going to test the hypothesis that all students have the same average gap between finishing one word and starting the next on the 2nd task. We can grab the data for this using the following R code:
index <- (word$task == 2 & word$gap_after < 2000)
x <- word$gap_after[index]
block <- word$id[index]
Then, all of the derived variables can be computed with the following:
xbar <- tapply(x, block, mean)
xbar_all <- mean(x)
s2 <- tapply(x, block, var)
n <- tapply(x, block, length)
K <- length(unique(block))
N <- sum(n)
R has a nice syntax where we can do vectorized operations. So, if we add/multiple two vectors of the same thing, it will do these operations component-wise. If we add/multiply a constant with a vector, it will allow the constant to every entry. For example, here is the denominator of the F-statistic:
## [1] 122995.4
Below, write the R code to create the F-statistic from the formula
you derived on today’s worksheet. Save the result as an object named
fstat
:
## [1] 7.004472
Use the following code to compute the p-value of the F-statistic. Is the test significant at a 0.001 level?
## [1] 1.554312e-15
As with the other tests we have used, there are build-in R functions to do all of this work for us. Here is the code to run the analysis of variance:
## Df Sum Sq Mean Sq F value Pr(>F)
## block 16 13784290 861518 7.004 1.58e-15 ***
## Residuals 1033 127054281 122995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There is a lot of information in the output, some of which we do not need. You should see, though, that there are the degrees of freedom, the F-statistic, and the p-value. [While very close, you’ll probably notice some numerical instability of the p-value computation; it is a little different from the computation (at least on my machine).]