STAT/MATH/CS 395: Statistical Learning
A major part (34%) of your grade for this course comes from the course's final project. The basic idea is that you will write up a report similar to the labs but with significantly more exposition about what you are doing and what the results show. For a good benchmark, think of what my lecture notes look like. Not too formal, but all of the ideas written out in words. In the last two weeks of the semester, we will have presentations based on these write-ups.
The overarching goal of the assignment is for you to apply what we have learned this semester to a dataset that has not been nicely cleaned for analysis. The skills we have covered are very applicable and I want to make sure you all feel comfortable applying them to real datasets outside of the ridge format I gave you in the course.
I am open to alternative ideas, but I see there being four main approaches to the final project.
- kaggle: choose a Kaggle machine learning competition and write up your attempts to build a predictive model. Generally, make sure that the dataset is sufficiently difficult by choosing something that includes multiple input files or works with text, image, or other non-tabular datasources.
- data collection: create a new dataset yourself, load it into R, and apply statistical learning techniques to the data. This could in theory be from any source, but to be interesting it will probably be either collecting another photo dataset or curating a dataset of texts
- new method: find a new method and/or R package that we have not covered in class and apply this to your dataset. Ideas include working with time series, network data, or using Bayesian models. Your write-up should include background on the method similar to what my lecture notes cover.
- You should upload your project on your GitHub page as a file named final_project.Rmd as well as the knit version final_project.html. You should also upload the dataset you are using, unless it can be downloaded from the web via a direct link (in which case, please provide the link). If this is not feasible, please let me know.
- I will copy our final project html file to the course website, so be aware that these will be generally accessible. The file should include your name at the top. If there are special circumstances where you would not like this to be publicly available, please also let me know ahead of time.
- You should submit your final project by noon on the day in which you present.
Presentation schedule and details:
- Presentations will take place in class during the final four class periods. This gives each student approximately 10 minutes each.
- On Thursday, 09 November, I will ask you all to write down your proposed topic and preferred presentation dates. If you will not be in class, please email me this information.
- By 10 November, I will release a schedule of who is presenting when. I will also let you know if your topic is inappropriate or needs some clarification by that point.
- If everyone want to go on the same days, I will find a fair way of devising the schedule.
- Note that I expect you all to be present during all presentations. I have a strict no-laptop / no-cellphone policy during presentations.
- If you behave, there may be snacks.
- There will be a final lab 24, but no further labs after that (it is a fun, easy lab to finish the semester with).
- You are done (other than attending the final classes) once you finish the final project
I initial thought of creating a rubric for the final project, but I am not sure that this would be very helpful for you all. Just keep in mind that there should be much more expository text than the labs and you should practice your presentation once.