## Learning Objectives

- Apply the
`lm_basic`

command to compare the means of two groups - Extend the two-sample t-test to multiple groups
- Apply model-fit metrics to a regression analysis

## Comparing Two Means

The `lm_basic`

function allows for much more complex models than
describing a simple mean. Consider a second set of data where
coins have been taken from two different cups:

In this case, we may want to model the mean of both cups. To
do this with `lm_basic`

, we just add the new variable to the
formula:

How do we read this new table? The intercept gives the mean
value for the **A** cup and the second term, called a slope,
gives the additional amount needed to get the mean of the
coins from cup **B**. So, the best guess of cup B’s mean is
equal to 3. What does the table look like when we add confidence
intervals:

The mean of cup A is predicted to be between 0.5872 and 4.913. The difference between cup B’s mean and cup A’s mean is somewhere between -2.8086 and 3.309. Because this difference includes zero, we say that there is no statistical evidence (at the 95% level) that the mean of the two cups is different… Think about this statement for a bit. Why would a value of zero be important in this model?

**Note**: I’m not calling it this in these notes, but just so you are aware
the statistical model we are applying is called a “t-test”.

## Comparing Three Categories

Let’s apply this to a more complex situation using the mammals sleep dataset. We can model the average time spent awake as a function of the diet type of a given mammal:

Now each of these values gives the difference between the base level, carnivores, and all of the others. So the predicted mean for hours spent awake for insectivores is 13.6263 + (-4.5662), or about 9 hours. The confidence intervals tell us whether there is evidence that a given diet type is different from carnivores. We see, for example, that there is statistical evidence that insectivores differ from carnivores, but no evidence for distinctions between carnivores and any other groups.

The careful observer will notice that there is a problem with this
approach: what if we want to compare two values when one is not the
base level? To do so, use the `fct_relevel`

command with the new
baseline used as the second parameter:

Now everything is compared to the `insecti`

category.

## Graphing the model

Coming back to graphics, notice that we can see the same output using a visual
representation of the data. All we need is the `geom_confint`

layer (I had to
remove some missing data before running the plot):

You’ll notice that this not exactly the same output. The interval for carnivores matches
well, but the one for insectivores is much wider in this model than in the `lm_basic`

output.
I do not want to get too into the weeds about why this is, other it has to do with the fact
that there are very few insectivores:

The `lm_basic`

model assumes that all of the data has the same standard deviation, whereas the
graphical model estimates the variation for each group. In most cases the difference is small,
but when we do not have much data there can be sizeable differences.

## Model Fit

We will not discuss all of the output of the `reg_table`

function in
this course, but one other piece of information will come in handy:
the Multiple R-squared. We define the residual of a model as the
difference between the actual observed response minus the fitted
mean. So, in this case we would have the following residuals:

Then, the multiple R-squared is given by:

This quantity will always be between 0 and 1, with values closer to 1 indicating that the model is describing a large proportion of the variation in the response variable. Notice that the R-squared value is always equal to 0 for a simple model with just a single mean.