Project Description:

This project is worth 25 points, and it will be due in Canvas on December 10th at midnight. The objective of this project is to engage you in a hands-on exploration of statistical analysis. In this project you will demonstrate your understanding of the main tenets of statistical description and inference that we have learned in this class. You will start by selecting and investigating two variables from one of three provided datasets: Dataset A, Dataset B, or Dataset C. A detailed description of each dataset is given below

Data


Dataset A: US Cereals Data

The UScereal data has 65 rows (cereals) and 11 columns (nutrition facts). The data come from the 1993 ASA Statistical Graphics Exposition, and are taken from the mandatory FDA food label. The data have been normalized here to a portion of one American cup.

This dataset contains the following variables: * Mfr (qualitative) - manufacturer, represented by its first initial: G=General Mills, K=Kelloggs, N=Nabisco, P=Post, Q=Quaker Oats, R=Ralston Purina.

References:

Venables, W. N. and Ripley, B. D. (1999)** Modern Applied Statistics with S-PLUS. Third Edition. Springer.

A detailed description of this data can be found here


Dataset B: Infant Birthweights

This dataset contains 10 variables. The data were collected at the Baystate Medical Center, Springfield, Mass during the year 1986. The data contains risk factors associated with low infant birth weight.

This dataset contains the following variables:

References:

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. Fourth edition. Springer.

Hosmer, D.W. and Lemeshow, S. (1989) Applied Logistic Regression. New York: Wiley

Dataset C: Car Sales 1993

This dataset contains 27 variables for 93 new cars for the 1993 model year. Measures given include price, mpg ratings, engine size, body size, and indicators of features. Cars were selected at random from among 1993 passenger car models that were listed in both the Consumer Reports issue and the PACE Buying Guide. Pickup trucks and Sport/Utility vehicles were eliminated due to incomplete information in the Consumer Reports source. Duplicate models (e.g., Dodge Shadow and Plymouth Sundance) were listed at most once.

This dataset contains the following variables:

References:

Venables, W. N. and Ripley, B. D. (1999) Modern Applied Statistics with S-PLUS. Third Edition. Springer.

Lock, R. H. (1993) 1993 New Car Data. Journal of Statistics Education 1(1). doi: 10.1080/10691898.1993.11910459