## Objectives

Extend the inference methods from last class to measure the difference between two or more means.

## Comparing Two Means

The `lm_basic`

function allows for much more complex models than
describing a simple mean. Consider a second set of data where
coins have been taken from two different cups:

In this case, we may want to model the mean of both cups. To
do this with `lm_basic`

, we just add the new variable to the
formula:

How do we read this new table? The intercept gives the mean
value for the **A** cup and the second term, called a slope,
gives the additional amount needed to get the mean of the
coins from cup **B**. So, the best guess of cup B’s mean is
equal to 3. What does the table look like when we add confidence
intervals:

The mean of cup A is predicted to be between 0.5872 and 4.913. The difference between cup B’s mean and cup A’s mean is somewhere between -2.8086 and 3.309. Because this difference includes zero, we say that there is no statistical evidence (at the 95% level) that the mean of the two cups is different… Think about this statement for a bit. Why would a value of zero be important in this model?

## Comparing Three Categories

Let’s apply this to a more complex situation using the mammals sleep dataset. We can model the average time spent awake as a function of the diet type of a given mammal:

Now each of these values gives the difference between the base level, carnivores, and all of the others. So the predicted mean for hours spent awake for insectivores is 13.6263 + (-4.5662), or about 9 hours. The confidence intervals tell us whether there is evidence that a given diet type is different from carnivores. We see, for example, that there is statistical evidence that insectivores differ from carnivores, but no evidence for distinctions between carnivores and any other groups.

The careful observer will notice that there is a problem with this
approach: what if we want to compare two values when one is not the
base level? To do so, use the `fct_relevel`

command with the new
baseline used as the second parameter:

Now everything is compared to the `insecti`

category.

## Model Fit

We will not discuss all of the output of the `reg_table`

function in
this course, but one other piece of information will come in handy:
the Multiple R-squared. We define the residual of a model as the
difference between the actual observed response minus the fitted
mean. So, in this case we would have the following residuals:

Then, the multiple R-squared is given by:

This quantity will always be between 0 and 1, with values closer to 1 indicating that the model is describing a large proportion of the variation in the response variable. Notice that the R-squared value is always equal to 0 for a simple model with just a single mean.

### Practice

We will work on the next lab for the remainder of the class: lab19.Rmd

Please upload your script to GitHub ahead of the next class.