Project

The MS&E 125 project provides hands-on experience with key steps of the data science pipeline:

  • Asking research questions
  • Identifying dataset(s) to help you answer your questions
  • Cleaning, exploring, and analyzing datasets using tools from 125 and beyond
  • Synthesizing and compiling your results in a short report
  • Presenting your results to an audience

The project will be completed in teams of 3. You are free to pursue any topic related to applied statistics. In previous years, teams have considered athletic performance, gender inequality, farming practices, restaurant quality, music success, gentrification, and standardized testing, just to name a few.

Requirements

With your project, we expect you to:

  • Come up with a research question. This should be an open ended question that can be answered with applied statistics tools. You should come up with your research question before you start any analysis, although you may need to change your question based on what data is attainable.
  • Find a dataset(s) that you can use to answer your research question. You should spend some time surveying the internet for what datasets are available before committing to a research question. Don’t discount how much time it takes to find data and prepare data!
  • Do at least the following as part of your analysis:
    • Make 2 informative plots.
    • Conduct 1 hypothesis test that does not involve a regression model.
    • Fit at least 1 regression model.

Please see the grading rubric for further details.

Checkpoints and Deliverables

  • Initial Proposal Meeting: each team is required to attend a 15 minute project proposal meeting with a member of the course staff. In this meeting, teams should have prepared a research question as well as what dataset(s) they plan to use. The course staff member from this meeting will serve as the team’s point of contact for the project for the rest of the quarter. Teams will receive full credit as long as they prepare a reasonable amount for the meeting. Please use this spreadsheet to sign up for an initial meeting slot. These meetings will happen the week of April 21st.
  • Proposal: each team will write a 1 page proposal that describes their plan for the project. This proposal should include the team’s research question, a description of the dataset(s) that the team plans to use, and 1 informative plot made using your data. The proposal should build off of the suggestions made during the intial meeting with course staff.
  • Report: at the end of the quarter, each team will write a 4 page report with the findings from their analysis. You may include additional results in an appendix, but graders are not required to read appendices. The report should be single spaced and include relevant plots.
  • Presentation: also at the end of the quarter, each team will deliver a 10 minute presentation to their designated course staff member. These presentations will be live on Zoom. Teams will schedule their presentations based on the availability of their designated course staff member. Each presentation should last 15 minutes total, with the last 5 minutes being reserved for questions. Please use this spreadsheet to sign up for a presentation slot. These presentations will happen during either week 10 or finals week.

Important Dates

  • Project Proposal: due Wednesday May 7th at 11:59pm.
  • Report: due Monday June 9th at 11:59pm.