In today’s notebook, we will make a new small dataset and then apply the two two-sample tests that we have learned. The data we will create (it’s a bit silly, but easy to generate and gives you a quick sense of how to do some random sampling) is based off the “Random Page” feature of Wikipedia. Following the links below will show you a random Wikipedia page from the English and German versions of the site, respectively.
Working either alone or in pairs, construct a dataset by going to 15 random English pages and 15 random German pages and recording the number of full-sized images on each page (do not include maps or small thumbnails). Then, build two vectors in R with the results as follows (I have filled in a set of five that you can add to your 15 to get a larger dataset):
Now, all we need to do are run two different statistical
tests/confidence intervals. Use var.test
below to compare
the variance of the two samples, using the default null-hypothesis that
there is no difference:
##
## F test to compare two variances
##
## data: x and y
## F = 0.28571, num df = 4, denom df = 4, p-value = 0.2524
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.02974787 2.74415140
## sample estimates:
## ratio of variances
## 0.2857143
And then, run t.test
to test the difference of the
means, using the default null-hypothesis that there is no difference
(you can choose to use the equal variances flag or not, depending on the
previous test’s outcome):
##
## Welch Two Sample t-test
##
## data: x and y
## t = -1.0541, df = 6.1132, p-value = 0.3317
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.310964 1.310964
## sample estimates:
## mean of x mean of y
## 1 2
Make sure that you fully understand what all of the elements of the output mean.