Project A: Data Dictionary with Collected Data

Due: 2018-02-20 (start of class)

Starter code: project-a.Rmd

Rubric: project-a-rubric.csv

Questions deadline: 2018-02-16; 5pm (Friday) - If you would like help with the project, please see/e-mail me before this deadline. After this time I will only help with technical issues, such as R crashing or GitHub being down.

The overarching goal of the first project is to collect a data set, produce a data dictionary, and provide a basic exploration of the variables from your data.

Please take note of the rubric linked to above before submitting your report. Make sure you have followed all of the instructions. Note that the relationship between points and letter grades may not follow the standard conventions (that is, 90\% may not be an A-), and generally components will not receive partial credit.

The first step is to decide what data set you would like to collect. The most important aspect of choosing your dataset is to pick something that has an overarching object of study. This is an exploratory analysis (EDA), so this does not mean a concrete yes/no question; rather, you should have a general theme of interest that you want to understand. For example, you should not have a question such as “do male students typically eat later than female students?”. Instead, you should have a question along the lines of: “what patterns exist in the dining times of UR students?”.

Your data set should follow these specifications:

  • at least 75 observations
  • at least 5 variables, with:
    • at least 2 numeric variables
    • at least one character variable of categories
  • not be externally available
  • must be uploaded to GitHub as a CSV file along with your report (you should read it in from GitHub through the RMarkdown document)

This is a chance to do something creative. Here are just a few suggestions:

  • pick a favorite magazine and collect data about the ads: what pages are they on; what’s the product; how many words are there; how many people are in the ad (ect).
  • sit on a bench somewhere during a busy time of day and record information about groups of people walking by you; how many people are in the group? which direction are they traveling? are they talking, texting, or just walking?
  • make a dataset with one observation for different people that you know; record the last date you saw each person; give each person a score from 1 to 10 for how much you like them
  • watch TV for a few hours and collect data about the ads: product in question; time of the ad; length of the add (ect.)
  • make a desperate plea on social media for data (hopefully you have more than 75 friends); append a column for how well you know each person

You’re not restricted to these ideas. In fact, I encourage you to find other interesting things to discover. For details on the format of the report and how it will be graded, please see the rubric and starter code.