**Due**: 2018-12-04 (start of class)

**Starter code**: project-iv.Rmd,
project-iv-lab.Rmd

**Rubric**: project-iv-rubric.csv

For this project you are going to select a topic in statistical programming or data analysis that we have not covered. You will create a set of class tutorial notes and a lab similar to the ones we have used this semester. During the last week of the semester, you will give a brief presentation of the tutorial. I will make all of these available on the class website so that everyone can read and benefit from them.

I am open to other options, but here are some ideas that would be good topics.

We will discuss other possibilities in class, but here are several options of topics for the project:

- using the
`glm`

function to fit a logistic regression - using the
`glm`

function to fit a Poisson regression - how to visualize network/graph data (
**igraph**) - using the
`melt`

function (see**reshape**) - using the
`gather`

function (see**tidyr**) - loading, writing, and basic manipulation of image data
- running a topic model over a corpus of texts
- running and tuning a random forest model (
**randomforest**) - running and visualizing a gradient boosted trees (see
**gbm**) - dealing with missing values
- how to make interactive graphics with
**ggplotly** - making use of “base” graphics (see the function
**plot**) - using the penalized lasso (
**glmnet**) - fitting an autoregressive model (see the
`arima()`

function) - fitting a moving averages model (see the
`arima()`

function) - using the Kolmogorov-Smirnov test (see
`ks.test()`

)

Note that you should, as part of the tutorial, find 1-2 datasets that you can demonstrate the material with and ask questions on. I suggest setting up your lab to include roughly 10 questions. I will meet with everyone while working on this project to make sure the scope of the tutorial is neither too narrow nor too broad.