In this file, I’ve tried to explain some of the best practices for how to approach the problems in the lab file. To start, I recommend loading all of the libraries you will need right here at the top:

library(readr)
library(readxl)
library(tmodels)

Question 1

This is an observational study because the researchers were not able to manually assign players to being in the “tall” or “short” group. Therefore, we cannot prove a causal relationship by this study.

Question 2

The author’s mostly interpret the results correctly by hedging their conclusions with phrases such as “predictive value”, “may impair”, and “potentially increasing”.

Question 3

See the Excel file lab05-table3.xlsx, which is posted online.

Question 4

I saved the dataset as an Excel file (you can find a link on the class website), and placed it in the same location as the RMarkdown file. Here is the code to read in the dataset and show the first few rows in RStudio:

hlas <- read_xlsx("lab05-table3.xlsx")
hlas
## # A tibble: 43 x 2
##    height las
##    <chr>  <chr>
##  1 tall   present
##  2 tall   present
##  3 tall   present
##  4 tall   present
##  5 tall   present
##  6 tall   present
##  7 tall   present
##  8 tall   absent
##  9 tall   absent
## 10 tall   absent
## # … with 33 more rows

While not required, here is the contingency table to check against the paper:

tmod_contingency(las ~ height, data=hlas)
##          Response
## Predictor absent present
##     short     18       1
##     tall      17       7

And, here is the statistical hypothesis test:

tmod_fisher_test(las ~ height, data=hlas)
##
## Fisher's Exact Test for Count Data
##
##  H0: Group variable and response categories are independent
##  HA: Group variable and response categories are dependent
##
##  P-value: 0.05928

Amazingly, this is not the p-value reported in the paper (which is given as 0.04)!

Question 5

The other tests are given by (the warning about the Chi-squared test is not something to worry about for now):

tmod_chi_squared_test(las ~ height, data=hlas)
## Warning in stats::chisq.test(newdf$g, newdf$yi): Chi-squared approximation
## may be incorrect
##
## Pearson's Chi-squared test with Yates' continuity correction
##
##  H0: Group variable and response categories are independent
##  HA: Group variable and response categories are dependent
##
##  Test statistic: chi-squared(1) = 2.5785
##  P-value: 0.1083
tmod_z_test_prop(las ~ height, data=hlas)
##
## Z-Test for Equality of Proportions (2 groups)
##
##  H0: true probability of effect is the same between groups
##  HA: true probability of effect is difference between groups
##
##  Test statistic: Z = -2.2554
##  P-value: 0.02411
##
##  Parameter: Pr(present|short) - Pr(present|tall)
##  Point estimate: -0.23904
##  Confidence interval: [-0.446760, -0.031311]

These p-values are quite a bit different than the other tests. While they tend to be close in general, they diverge when some of the counts in the contingency tables are very small.

Question 6

Just given the assumptions, the best model is the chi-squared because neither the row nor the column sums are fixed ahead of time.

Question 7

Now, read in the second set of data:

plas <- read_xlsx("lab05-table4.xlsx")
plas
## # A tibble: 43 x 2
##    previous_sprain las
##    <chr>           <chr>
##  1 yes             present
##  2 yes             present
##  3 yes             present
##  4 yes             present
##  5 yes             present
##  6 yes             present
##  7 yes             absent
##  8 yes             absent
##  9 yes             absent
## 10 yes             absent
## # … with 33 more rows

And again, here is the contingency table for us to verify with the paper:

tmod_contingency(las ~ previous_sprain, data=plas)
##          Response
## Predictor absent present
##       no      25       2
##       yes     10       6

Finally, here are the three tests:

tmod_z_test_prop(las ~ previous_sprain, data=plas)
##
## Z-Test for Equality of Proportions (2 groups)
##
##  H0: true probability of effect is the same between groups
##  HA: true probability of effect is difference between groups
##
##  Test statistic: Z = -2.2953
##  P-value: 0.02172
##
##  Parameter: Pr(present|no) - Pr(present|yes)
##  Point estimate: -0.30093
##  Confidence interval: [-0.557890, -0.043964]
tmod_chi_squared_test(las ~ previous_sprain, data=plas)
## Warning in stats::chisq.test(newdf$g, newdf$yi): Chi-squared approximation
## may be incorrect
##
## Pearson's Chi-squared test with Yates' continuity correction
##
##  H0: Group variable and response categories are independent
##  HA: Group variable and response categories are dependent
##
##  Test statistic: chi-squared(1) = 4.1849
##  P-value: 0.04079
tmod_fisher_test(las ~ previous_sprain, data=plas)
##
## Fisher's Exact Test for Count Data
##
##  H0: Group variable and response categories are independent
##  HA: Group variable and response categories are dependent
##
##  P-value: 0.03691

Notice once again that the test does not match the paper! This is not good, but also not our fault.