Today we will look at a dataset of reviews of burritos in southern California. This is a data set collected by a group of college friends who live in the greater San Diego area.

`burrito <- read_csv("https://statsmaths.github.io/stat_data/burrito.csv")`

```
## Parsed with column specification:
## cols(
## location = col_character(),
## cost = col_double(),
## yelp = col_double(),
## google = col_double(),
## chips_included = col_double(),
## hunger = col_double(),
## tortilla = col_double(),
## temp = col_double(),
## meat = col_double(),
## fillings = col_double(),
## meat_filling = col_double(),
## uniformity = col_double(),
## salsa = col_double(),
## synergy = col_double(),
## wrap = col_double(),
## overall = col_double(),
## lat = col_double(),
## lon = col_double()
## )
```

The available variables in the data are:

- location: the name of the restaurant
- cost: total cost for the burrito
- yelp: the average Yelp review score
- google: the average Google review score
- chips_included: equals 1 if chips are included with the burrito
- hunger: score from 1 to 5 on how much the burrito filled up the reviewer
- tortilla: rating from reviewer; 1 (terrible) to 5 (awesome)
- temp: rating from reviewer; 1 (terrible) to 5 (awesome)
- meat: rating from reviewer; 1 (terrible) to 5 (awesome)
- fillings: rating from reviewer; 1 (terrible) to 5 (awesome)
- meat_filling: rating from reviewer; 1 (terrible) to 5 (awesome)
- uniformity: rating from reviewer; 1 (terrible) to 5 (awesome)
- salsa: rating from reviewer; 1 (terrible) to 5 (awesome)
- synergy: rating from reviewer; 1 (terrible) to 5 (awesome)
- wrap: rating from reviewer; 1 (terrible) to 5 (awesome)
- overall: rating from reviewer; 1 (terrible) to 5 (awesome)
- lat: latitude (in degrees North) of the taco restaurant
- lon: longitude (in degrees East) of the taco restaurant

Fit a linear regression predicting the average yelp score of locations in this dataset and look at the model using `reg_table`

:

```
model <- lm_basic(yelp ~ 1, data = burrito)
reg_table(model, level = 0.95)
```

```
##
## Call:
## lm_basic(formula = yelp ~ 1, data = burrito)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.430 -0.305 0.070 0.070 0.570
##
## Coefficients:
## Estimate 2.5 % 97.5 %
## (Intercept) 3.930 3.808 4.052
##
## Residual standard error: 0.4287 on 49 degrees of freedom
```

The average Yelp score of all restaurants in this area is just 3.7; looking at the confidence interval from the model does it seem likely that burrito locations have a higher Yelp rating than locations in general, or is this just noise in the data?

**Answer**: No, it seems like they intentionally selected restaurants with good yelp scores.

The regression model confidence interval requires that the data are sampled independently from the larger population. Why might this dataset not hold to this assumption?

**Answer**: They likely only selected good restaurants or those that were close to where they lived.

Now fit a linear regression that predicts the yelp score as a function of whether the burrito includes chips.

```
model <- lm_basic(yelp ~ chips_included, data=burrito)
reg_table(model, level = 0.95)
```

```
##
## Call:
## lm_basic(formula = yelp ~ chips_included, data = burrito)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41176 -0.30101 0.08824 0.08824 0.58824
##
## Coefficients:
## Estimate 2.5 % 97.5 %
## (Intercept) 3.91176 3.76270 4.061
## chips_included 0.05699 -0.20652 0.320
##
## Residual standard error: 0.4323 on 48 degrees of freedom
## Multiple R-squared: 0.003923, Adjusted R-squared: -0.01683
## F-statistic: 0.1891 on 1 and 48 DF, p-value: 0.6656
```

Describe what the chips coefficient in the previous model means in words:

**Answer**: Its the additional amount that you would expect a restaurant that includes chips with their burritos to have as a yelp score.

Is the slope coefficient statistically significant from zero (in other words, is zero in the confidence interval)?

**Answer**: No, it runs from -0.20652 to 0.320 (a negative to a positive value).

Describe this in terms of your answers above.

**Answer**: There does not seem to be a statistically significant difference in the Yelp score based on whether chips are included with the burrito.

The word *synergy* can be defined as:

the interaction or cooperation of two or more organizations, substances, or other agents to produce a combined effect greater than the sum of their separate effects.

This is not something I would have thought of when considering tacos, but letâ€™s see how synergy effected the scores of both the reviewers as well as Google and Yelp users in general.

First, Iâ€™ll add a variable called high_synergy defined as whether the synergy score is greater than 3.5.

`burrito$high_synergy <- (burrito$synergy > 3.5)`

Fit a regression predicting the Yelp score as a function of the variable `high_synergy`

.

```
model <- lm_basic(yelp ~ high_synergy, data = burrito)
reg_table(model, level = 0.95)
```

```
##
## Call:
## lm_basic(formula = yelp ~ high_synergy, data = burrito)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.27778 -0.21224 -0.01562 0.22222 0.72222
##
## Coefficients:
## Estimate 2.5 % 97.5 %
## (Intercept) 3.777778 3.580078 3.975
## high_synergyTRUE 0.237847 -0.009277 0.485
##
## Residual standard error: 0.4172 on 48 degrees of freedom
## Multiple R-squared: 0.07237, Adjusted R-squared: 0.05305
## F-statistic: 3.745 on 1 and 48 DF, p-value: 0.05887
```

Is there strong evidence that Yelp scores are higher for high synergy burrito restaurants?

**Answer**: No, again the confidence interval extends from a negative value to a positive one.

Now fit a regression predicting the overall score as a function of the variable high_synergy.

```
model <- lm_basic(overall ~ high_synergy, data = burrito)
reg_table(model, level = 0.95)
```

```
##
## Call:
## lm_basic(formula = overall ~ high_synergy, data = burrito)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1963 -0.2641 0.1359 0.3359 0.9037
##
## Coefficients:
## Estimate 2.5 % 97.5 %
## (Intercept) 2.6963 2.4583 2.934
## high_synergyTRUE 1.1678 0.8703 1.465
##
## Residual standard error: 0.5021 on 48 degrees of freedom
## Multiple R-squared: 0.5649, Adjusted R-squared: 0.5558
## F-statistic: 62.31 on 1 and 48 DF, p-value: 3.19e-10
```

Is there evidence that high synergy burrito locations have a higher overall score than low synergy burrito locations?

**Answer**: Yes! The confidence interval runs from 0.87 to 1.465.

Using the model from the previous question, what does the model predict will be the overall score of a location with low synergy?

**Answer**: 2.69.

What does the model predict will be the overall score of a location with high synergy?

**Answer**: 2.69 + 1.1678 = 3.8578.

Finally, fit a regression model that predicts the overall score as a function of the variable `synergy`

coded as a factor. That is, use `overall ~ factor(synergy)`

in the formula. We are doing this so that each value of the synergy score (1, 2, 3, 4, and 5) are treated as a category rather than a continuous variable.

```
model <- lm_basic(overall ~ factor(synergy), data = burrito)
reg_table(model, level = 0.95)
```

```
##
## Call:
## lm_basic(formula = overall ~ factor(synergy), data = burrito)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.05323 -0.25323 0.00417 0.31875 0.72222
##
## Coefficients:
## Estimate 2.5 % 97.5 %
## (Intercept) 2.5000 1.6294 3.371
## factor(synergy)2 -0.2222 -1.1399 0.695
## factor(synergy)3 0.6917 -0.2317 1.615
## factor(synergy)4 1.3532 0.4687 2.238
## factor(synergy)5 1.7000 0.4688 2.931
##
## Residual standard error: 0.4322 on 45 degrees of freedom
## Multiple R-squared: 0.6977, Adjusted R-squared: 0.6708
## F-statistic: 25.96 on 4 and 45 DF, p-value: 3.408e-11
```

What does the model predict the overall score for a restaurant with a synergy score of 3 will be?

**Answer**: 2.5 + 0.6917 = 3.1917

Finally, draw a plot that corresponds to the model you just produced for the synergy score. Hint: Youâ€™ll need the `geom_confint`

layer.

```
ggplot(burrito, aes(factor(synergy), overall)) +
geom_confint()
```

`## Warning: Removed 2 rows containing missing values (geom_pointrange).`