Project Description:

This project is worth 25 points. It will be due in Canvas on May 10th at midnight. The objective of this project is to engage you in a hands-on exploration of statistical analysis. In this project you will demonstrate your understanding of the main tenets of statistical description and inference that we have learned in this class. For this project, you will need to find data related to a topic that interests you such as finance, medicine, sports, ecology, etc. From this data, you will need to choose one or two variables that relate to a statistical question that interests you and analyze the data using the appropriate methods. I have provided three sets of data on the course website. Complete the instructions for steps 1-3 to conduct a thorough statistical analysis of your chosen variables. Write up your results as a short report (1-5 pages). You may use any statistical software available to you to analyze your chosen variables (for example: Microsoft Excel, R, SAS, Python, MATLAB, SPSS, or other). The applications in the “resources” tab on the course website may also be used. I have also provided several links for online software that can be be used to complete your analyses. In addition, I have included several websites (links below) which can be used to explore publicly available datasets.

Key Learning Objectives:

Gain proficiency in selecting and handling real-world datasets. Develop skills in using descriptive statistics to characterize the distribution of variables. Formulate testable hypotheses based on observed patterns or relationships in the data. Apply appropriate statistical tests to evaluate hypotheses and draw meaningful conclusions. Enhance data visualization skills through the creation of descriptive plots.

Assessment Criteria:

Deliverables:

Please read all instructions carefully

Project Steps:

1. Variable Selection:

Choose one or two variables from a dataset that interests you or one of the provided datasets.

  • (A) Provide some background related to the dataset you chose such as what topic it relates to, how it was gathered, sampling method etc - if available. Provide a link or appropriate reference.

  • (B) Articulate a statistical question related to the dataset you chose and select one or two variables that relate to this question.

  • (C) Write a short description of the variable(s) and how they relate to the question you are interested in exploring.

2. Statistical Description:

  • (A) Utilizing descriptive statistics, provide a comprehensive overview of each selected variable. For quantitative variables, compute the minimum, Q1, median, mean, Q3, maximum, interquartile range (IQR), and the standard deviation and provide them in a table format (see below). For categorical variables, construct a frequency table and identify the modal category.
Statistic Value
minimum
Q1
median
mean
Q3
maximum
interquartile range
standard deviation
  • (B) Create at least one descriptive plot for each variable: (e.g., histogram, boxplot, dot plot, stem-and-leaf plot, pareto chart, pie chart, bar chart) to visualize their distribution. Be sure to include a caption for each plot.

  • (C) Using the descriptive statistics computed in parts A and B, write a short description of the shape, center, and variability of your chosen variables. Make note of any outliers and how they may influence your results.

3. Hypothesis Formulation and Testing:

Building on the statistical description in Part 2., formulate a testable hypothesis (or multiple if you analyze the variables separately) related to your question in Part 1 that explores the relationship between the chosen variables or compares it to a population parameter. Depending on the nature of your question, be sure to choose the appropriate statistical test or model. This may include any of the tests we have talked about in class such as: one and two-sample tests of means and proportions, categorical tests such as one of the \(\chi^2\)-tests, non-parametric tests, bootstrap tests, regression, or ANOVA.

  • (A) Write a short description justifying your choice of hypothesis test or model and how it will help you answer the statistical question you outline in Part 1. State the null and alternative hypotheses clearly and address any assumptions of the test and whether your data adheres to the assumptions.
  • (B) Conduct the chosen statistical test: Give the critical value, degrees of freedom (if applicable), test statistic and \(pvalue\) of the test. If you are using regression - be sure to report the coefficient of determination, residual standard deviation, and coefficient estimates and their standard error. If you are conducting an ANOVA, be sure to report the appropriate ANOVA table include model and error sum of squares, mean squares, F-statistics, and pvalues.
  • (C) Interpret the results of your test(s) and draw conclusions regarding the validity or rejection of the hypothesis.