06 April 2021 08 April 2021
This page outlines the instructions for the first project. You should have a file
project03.Rmd in your RStudio workspace where you can work on the project. I find that students prefer having a consistent format for the projects, so I will attempt to keep the format the same throughout the semester.
On the due-date, your group is responsible for completing three elements:
As described on the syllabus, the project will be graded as either Satisfactory or Unsatisfactory. I will provide additional feedback that you can address in the next project.
The data for this project is similar to the previous one, but this time reviews come from Yelp. I created the data set based on what is provided by the Yelp Open Dataset project. As with before, I have selected reviewers with a large number of reviews. However, this time you have a number of different variables to work with. Variables that are available to focus on (either as a prediction or as an unsupervised grouping variable) are:
As with the first two projects, you are encouraged to take the data in whatever direction you find most interesting. The only thing I require is that you focus a non-trivial amount of your work on integrating the unsupervised approaches covered in Notebook13 into your analysis. Note that gender has too few categories to use unsupervised on its own and business names likely have too many to do supervised learning.
If you need some questions to get started, I am happy to provide some of these. However, there are so many directions to go in, I only want to do this if you are struggling with an idea.
Each group is working with a different city’s data. You should be able to download your data set from within the
While working through the project, I typically find that many groups ask for help writing the same bits of code. Any notes that I want to share about how to do specific tasks will be added here as we work through the project.