Teacher Salary

We will start today by looking at a small dataset containing teacher salaries from 2009-2010 for 71 randomly choosen teachers employed by the St. Louis Public School in in Michigan.

teachers <- read_csv("https://statsmaths.github.io/stat_data/teachers_pay.csv")
## Parsed with column specification:
## cols(
##   base = col_integer(),
##   degree = col_character(),
##   years = col_double()
## )

The available variables are

Using the mean function, what is the average base pay of all teachers in the dataset?

mean(teachers$base)
## [1] 56937.61

Fit a model for the mean of the base pay variable using lm_basic. Save the model as an object called “model”:

model <- lm_basic(base ~ 1, data = teachers)

Using a call to reg_table, find the mean implied by the model:

reg_table(model)
## 
## Call:
## lm_basic(formula = base ~ 1, data = teachers)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -21511  -5764   2976   8422  11292 
## 
## Coefficients:
##             Estimate
## (Intercept)    56938
## 
## Residual standard error: 9029 on 69 degrees of freedom

Does the mean agree with your answer to question 2?

Answer: Yes.

Add a 95% confidence interval to the regression table.

reg_table(model, level = 0.95)
## 
## Call:
## lm_basic(formula = base ~ 1, data = teachers)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -21511  -5764   2976   8422  11292 
## 
## Coefficients:
##             Estimate 2.5 % 97.5 %
## (Intercept)    56938 54785  59091
## 
## Residual standard error: 9029 on 69 degrees of freedom

What is the range of mean salaries implied by the confidence interval?

Answer:

Draw a histogram of the base salary values for the entire dataset.

ggplot(teachers, aes(base)) +
  geom_histogram(color = "black", fill = "white", bins = 20)

Do most of the salary values fall within the range given in question 5? Why or why not?

Answer: No, because the confidence interval is trying to capture the mean, not the data.

Use the filter command to construct a new dataset called masters consisting of just those teachers with a masters degree.

masters <- filter(teachers, degree == "MA")

Compute a 95% confidence interval for the mean pay of teachers with a master’s degree. Does this range intersect the one you had in question 5?

model <- lm_basic(base ~ 1, data = masters)
reg_table(model, level = 0.95)
## 
## Call:
## lm_basic(formula = base ~ 1, data = masters)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16098.7  -5673.2    958.3   5698.8  10436.3 
## 
## Coefficients:
##             Estimate 2.5 % 97.5 %
## (Intercept)    57794 54890  60697
## 
## Residual standard error: 7915 on 30 degrees of freedom

Answer: The range intersects, but is not equivalent, to the range in the model with all teachers.

Murder Data

Now load the following dataset containing all murders that have occurred in London from 1 January 2006 to 7 September 7 2011.

london <- read_csv("https://statsmaths.github.io/stat_data/london_murders.csv")
## Parsed with column specification:
## cols(
##   age = col_integer(),
##   year = col_integer(),
##   borough = col_character()
## )

The available variables are:

Find an 80% confidence interval for the average age of the victim of a murder in London.

model <- lm_basic(age ~ 1, data = london)
reg_table(model, level = 0.8)
## 
## Call:
## lm_basic(formula = age ~ 1, data = london)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.858 -12.858  -3.858  10.142  66.142 
## 
## Coefficients:
##             Estimate  10 %  90 %
## (Intercept)    33.86 33.07 34.65
## 
## Residual standard error: 17.85 on 837 degrees of freedom

Make sure you actually extract the answer here:

Answer: The interval is from 33.07 to 34.65 years.

Describe in words how the proceeding confidence interval should be interpreted.

Answer: We are using a procedure that, if applied to multiple datasets, would capture the true mean 80% of the time. This procedure found a range from 33.07 to 34.65 years.