In this file, I’ve tried to explain some of the best practices for how to approach the problems in the lab file. To start, I recommend loading all of the libraries you will need right here at the top:

```
library(readr)
library(readxl)
library(tmodels)
```

**Question 1**

This is an observational study because the researchers were not able to manually assign players to being in the “tall” or “short” group. Therefore, we cannot prove a causal relationship by this study.

**Question 2**

The author’s mostly interpret the results correctly by hedging their conclusions with phrases such as “predictive value”, “may impair”, and “potentially increasing”.

**Question 3**

See the Excel file lab05-table3.xlsx, which is posted online.

**Question 4**

I saved the dataset as an Excel file (you can find a link on the class website), and placed it in the same location as the RMarkdown file. Here is the code to read in the dataset and show the first few rows in RStudio:

```
hlas <- read_xlsx("lab05-table3.xlsx")
hlas
```

```
## # A tibble: 43 x 2
## height las
## <chr> <chr>
## 1 tall present
## 2 tall present
## 3 tall present
## 4 tall present
## 5 tall present
## 6 tall present
## 7 tall present
## 8 tall absent
## 9 tall absent
## 10 tall absent
## # … with 33 more rows
```

While not required, here is the contingency table to check against the paper:

`tmod_contingency(las ~ height, data=hlas)`

```
## Response
## Predictor absent present
## short 18 1
## tall 17 7
```

And, here is the statistical hypothesis test:

`tmod_fisher_test(las ~ height, data=hlas)`

```
##
## Fisher's Exact Test for Count Data
##
## H0: Group variable and response categories are independent
## HA: Group variable and response categories are dependent
##
## P-value: 0.05928
```

Amazingly, this is not the p-value reported in the paper (which is given as 0.04)!

**Question 5**

The other tests are given by (the warning about the Chi-squared test is not something to worry about for now):

`tmod_chi_squared_test(las ~ height, data=hlas)`

```
## Warning in stats::chisq.test(newdf$g, newdf$yi): Chi-squared approximation
## may be incorrect
```

```
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## H0: Group variable and response categories are independent
## HA: Group variable and response categories are dependent
##
## Test statistic: chi-squared(1) = 2.5785
## P-value: 0.1083
```

`tmod_z_test_prop(las ~ height, data=hlas)`

```
##
## Z-Test for Equality of Proportions (2 groups)
##
## H0: true probability of effect is the same between groups
## HA: true probability of effect is difference between groups
##
## Test statistic: Z = -2.2554
## P-value: 0.02411
##
## Parameter: Pr(present|short) - Pr(present|tall)
## Point estimate: -0.23904
## Confidence interval: [-0.446760, -0.031311]
```

These p-values are quite a bit different than the other tests. While they tend to be close in general, they diverge when some of the counts in the contingency tables are very small.

**Question 6**

Just given the assumptions, the best model is the chi-squared because neither the row nor the column sums are fixed ahead of time.

**Question 7**

Now, read in the second set of data:

```
plas <- read_xlsx("lab05-table4.xlsx")
plas
```

```
## # A tibble: 43 x 2
## previous_sprain las
## <chr> <chr>
## 1 yes present
## 2 yes present
## 3 yes present
## 4 yes present
## 5 yes present
## 6 yes present
## 7 yes absent
## 8 yes absent
## 9 yes absent
## 10 yes absent
## # … with 33 more rows
```

And again, here is the contingency table for us to verify with the paper:

`tmod_contingency(las ~ previous_sprain, data=plas)`

```
## Response
## Predictor absent present
## no 25 2
## yes 10 6
```

Finally, here are the three tests:

`tmod_z_test_prop(las ~ previous_sprain, data=plas)`

```
##
## Z-Test for Equality of Proportions (2 groups)
##
## H0: true probability of effect is the same between groups
## HA: true probability of effect is difference between groups
##
## Test statistic: Z = -2.2953
## P-value: 0.02172
##
## Parameter: Pr(present|no) - Pr(present|yes)
## Point estimate: -0.30093
## Confidence interval: [-0.557890, -0.043964]
```

`tmod_chi_squared_test(las ~ previous_sprain, data=plas)`

```
## Warning in stats::chisq.test(newdf$g, newdf$yi): Chi-squared approximation
## may be incorrect
```

```
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## H0: Group variable and response categories are independent
## HA: Group variable and response categories are dependent
##
## Test statistic: chi-squared(1) = 4.1849
## P-value: 0.04079
```

`tmod_fisher_test(las ~ previous_sprain, data=plas)`

```
##
## Fisher's Exact Test for Count Data
##
## H0: Group variable and response categories are independent
## HA: Group variable and response categories are dependent
##
## P-value: 0.03691
```

Notice once again that the test does not match the paper! This is not good, but also not our fault.