In today’s notebook, we will make a new small dataset and then apply the two two-sample tests that we have learned. The data we will create (it’s a bit silly, but easy to generate and gives you a quick sense of how to do some random sampling) is based off the “Random Page” feature of Wikipedia. Following the links below will show you a random Wikipedia page from the English and German versions of the site, respectively.

Working either alone or in pairs, construct a dataset by going to 15 random English pages and 15 random German pages and recording the number of full-sized images on each page (do not include maps or small thumbnails). Then, build two vectors in R with the results as follows (I have filled in a set of five that you can add to your 15 to get a larger dataset):

x <- c(0, 1, 0, 2, 2)
y <- c(2, 1, 0, 5, 2)

Now, all we need to do are run two different statistical tests/confidence intervals. Use var.test below to compare the variance of the two samples, using the default null-hypothesis that there is no difference:

# Question 01
var.test(x, y)
## 
##  F test to compare two variances
## 
## data:  x and y
## F = 0.28571, num df = 4, denom df = 4, p-value = 0.2524
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.02974787 2.74415140
## sample estimates:
## ratio of variances 
##          0.2857143

And then, run t.test to test the difference of the means, using the default null-hypothesis that there is no difference (you can choose to use the equal variances flag or not, depending on the previous test’s outcome):

# Question 02
t.test(x, y)
## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = -1.0541, df = 6.1132, p-value = 0.3317
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.310964  1.310964
## sample estimates:
## mean of x mean of y 
##         1         2

Make sure that you fully understand what all of the elements of the output mean.