\({\bf (1)}\) Define the following terms:
\({\bf (2)}\) Explain the difference between a qualitative (categorical) and a quantitative variable
A qualitative variable is a non-numeric characteristic such as a name or label while a quantitative variable is numerical characteristic such as a measurement or count
\({\bf (3)}\) Explain the difference between a discrete and a continuous variable and give an example of each
A discrete variable is a quantitative variable that takes on only distinct whole number (integer) values such as counts whereas a continuous variable is a quantitative variable that can assume any value within a certain interval such temperature or height
\({\bf (4)}\) Explain the difference between a nominal and an ordinal variable and give an example of each
A nominal variable is a qualitative variable that has categories with no inherent ordering such as the names of a city or the brands or shoes. An ordinal variable is a qualitative variable that has an inherent ordering to its categories such as a Likert score or education level.
\({\bf (5)}\) At What age did women marry? A historian wants to estimate the average age at marriage of women in New England in the early 19th century. Within her state archives, she finds marriage records for the years \(1800 - 1820\), which she treats as a sample of all marriage records from the early 19th century. The average age of the women in the records is \(24.1\) years of age. Using the appropriate statistical method, she estimates that the average age of brides in the early \(19^{th}\)- century New England was between \(23.5\) and \(24.7\) years of age.
coin.flip.result. | H | H | H | T | H | T | T | T | H | H | T | T | T | H | T |
Compute the frequency table for the variable ``coin flip result” and answer the following questions:
Result | Frequency | Relative Frequency |
---|---|---|
H | 7 | 7/15 = 0.47 |
T | 8 | 8/15 = 0.53 |
\({\bf (7)}\) A survey about color preferences reported the age distribution of the people who responded. Below are the results
Age Group (Years) | 1-18 | 19-24 | 25-25 | 36-60 | 51-69 | 70 and over |
Counts | 10 | 97 | 70 | 36 | 14 | 5 |
Relative Frequency | 0.04 | 0.42 | 0.30 | 0.16 | 0.06 | 0.02 |
Use this table to answer parts a - d
(a) Compute the relative frequency for each age group
(b) Make a bar graph where the heights of the bars are relative frequencies
(c) Describe the distribution the distribution is slightly right skew indicating that most respondants were between 19 and 35 years of age
(d) Explain why your bar graph is not a histogram A histogram has intervals of equal length
Type of Spam | Percentage |
---|---|
Adult | 14.5 |
Financial | 16.2 |
Health | 7.3 |
Leisure | 7.8 |
Products | 21.0 |
Scams | 14.2 |
Use the table to answer the questions a and b
\({\bf (9)}\) A farmer in Idaho is interested in the number of rainy days in a given year so he records the number of rainy \(R\), cloudy \(C\) and sunny \(S\) days over two weeks in \(May\). His observations are \[\{R,R,C,C,C,S,S,C,R,R,C,R,S,S\}\]. Use the farmers observations to answer the following questions.
\({\bf (10)}\) Consider the following data from a survey conducted on college students in the state of Florida. As part of their research, surveyors recorded the high school performance (measured in grade point average - GPA) of 35 college students from across the state. For your convenience the data have been sorted from least to greatest:
\[2.0, 2.1, 2.3, 2.8, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0,\] \[3.0, 3.0, 3.0, 3.1, 3.2, 3.3, 3.3, 3.4, 3.4, 3.4,\] \[3.4, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6, 3.7, 3.7, 3.8,\] \[3.8, 3.8, 3.8, 4.0, 4.0\]
The following frequency table gives the distribution of the variable high school GPA (with values rounded to nearest 0.5). Fill in the table and answer the following questions:
GPA | Frequency | Relative frequency | Cumulative RF |
---|---|---|---|
2.0 | 2 | 0.06 | 0.06 |
2.5 | 1 | 0.03 | 0.09 |
3.0 | 12 | 0.34 | 0.43 |
3.5 | 14 | 0.40 | 0.83 |
4.0 | 6 | 0.17 | 1.00 |
Using the frequency table on the rounded values the mode is the value with the highest relative frequency $ = 3.5$
(Using the raw data) the mode is the most frequent value observed in the data which is \(3.0\)
## [1] "Using the Rounded GPA scores:"
##
## The decimal point is 1 digit(s) to the left of the |
##
## 20 | 00
## 22 |
## 24 | 0
## 26 |
## 28 |
## 30 | 000000000000
## 32 |
## 34 | 00000000000000
## 36 |
## 38 |
## 40 | 000000
##
## NULL
## [1] "Using the Raw Data:"
##
## The decimal point is 1 digit(s) to the left of the |
##
## 20 | 0
## 21 | 0
## 22 |
## 23 | 0
## 24 |
## 25 |
## 26 |
## 27 |
## 28 | 0
## 29 |
## 30 | 000000000
## 31 | 0
## 32 | 0
## 33 | 00
## 34 | 0000
## 35 | 0000
## 36 | 00
## 37 | 00
## 38 | 0000
## 39 |
## 40 | 00
##
## NULL
\({\bf (11)}\) Which statistic is more resistant to outliers, the mean or median? Why?
The median is resistant to outliers because it relies on only the middle value (or middle two values if \(n\) is even). Therefore, how far a value is from the center does not influence the median. The mean, however, is computed using values in a sample distribution. This causes the mean to be pulled in the direction of extreme values.
\({\bf (12)}\) Describe the shape of the following distributions and for each distribution identify if the mean will be larger, smaller or the same as the median.
(a) Skew right. The mean will be greater than median because the center of the data is pulled toward the right side of the distribution
(b)
Symmetric/Bell shaped. The mean and the median
will be equal
(c)
Skew left. The mean will be less than median
because the center of the data is pulled toward the left side of the
distribution
\({\bf (13)}\) Consider the following four sets of observations of a quantitative variable \(x\). For your convenience the observations have been sorted in increasing order. Match datasets \(1-4\) with the correct histogram (labeled \(A - D\))
\[\text{Dataset 1} = \{0.1, 1.1, 2.6, 2.7, 3.4, 3.4, 4.1, 4.4, 8.8, 9.6\}\] \[\text{Dataset 2}= \{0.1, 0.3, 1.2, 2.4, 4.4, 4.5, 8.0, 8.9, 9.3, 9.3\}\] \[\text{Dataset 3} = \{1.1, 3.8, 5.3, 6.0, 6.2, 6.9, 7.9, 7.9, 8.1, 8.7\}\] \[\text{Dataset 4} = \{3.4, 4.5, 5.4, 5.6, 7.0, 8.5, 8.9, 9.2, 9.7, 9.7\}\]
\({\bf (14)}\) Consider the following set of 10 observations of a variable \(X\) sorted from least to greatest: \[3.3, 3.8, 4.0, 4.8, 4.8, 5.1, 5.2, 5.6, 5.7, 6.9\] Use the data to answer parts a-b
\({\bf (15)}\) Consider the following \(n = 20\) observations of the sugar and sodium content of several popular cereal brands and answer questions a - g:
Brand | Sodium (mg) | Sugar (g) | Type |
---|---|---|---|
Frosted Mini Wheats | 0 | 11 | A |
Raisin Bran | 340 | 18 | A |
All Bran | 70 | 5 | A |
Apple Jacks | 140 | 14 | C |
Cap’n Crunch | 200 | 12 | C |
Cheerios | 180 | 1 | C |
Cinnamon Toast Crunch | 210 | 10 | C |
Crackling Oat Bran | 150 | 16 | A |
Fiber One | 100 | 0 | A |
Frosted Flakes | 130 | 12 | C |
Froot Loops | 140 | 14 | C |
Honey Bunches of Oats | 180 | 7 | A |
Honey Nut Cheerios | 190 | 9 | C |
Life | 160 | 6 | C |
Rice Krispies | 290 | 3 | C |
Honey Smacks | 50 | 15 | A |
Special K | 220 | 4 | A |
Wheaties | 180 | 4 | A |
Corn Flakes | 200 | 3 | A |
Honeycomb | 210 | 11 | C |
Sugar (g) | Frequency | RF(x) | CRF(x) |
---|---|---|---|
< = 2 (g) | 2 | 0.10 | 0.10 |
2 - 4 (g) | 4 | 0.20 | 0.30 |
4 - 6 (g) | 2 | 0.10 | 0.40 |
6 - 8 (g) | 1 | 0.05 | 0.45 |
8 - 10 (g) | 2 | 0.10 | 0.55 |
10 - 12 (g) | 4 | 0.20 | 0.75 |
12 - 14 (g) | 2 | 0.10 | 0.85 |
14 - 16 (g) | 2 | 0.10 | 0.95 |
> 16 (g) | 1 | 0.05 | 1.00 |
(b) Compute the mean, median, and mode of the variable ``Sugar (g)” Using the raw data there are 5 modes each with a frequency of 2 mode(s) \(= 3,4,11,12,14\) median \(= \frac{9+10}{2} = 9.5\) \[\bar{x} = \frac{1}{20}\sum_{i=1}^{20} x_i = \frac{(11+18+5+14+...+3+11)}{20} = 8.75\] (note that the mean may be slightly different if you used the frequency table above
(c) Compute the variance and standard deviation of the variable ``Sugar (g)” \[s^2 = \frac{1}{19}\sum_{i=1}^{19} (x-\bar{x})^2 = \frac{(11 - 8.75)^2 + (18-8.75)^2 + ...+(3-8.75)^2 + (11 - 8.75)^2}{19}\] \[ = \frac{5.06 + 85.56 + ...+33.06+5.06}{19} = 28.3\] \[s = \sqrt{s^2} = \sqrt{28.3} = 5.32\]
(d) Compute the quartiles (Q1, Q2, Q3) and interquartile range of the variable ``Sugar (g)” \[Q1 = 4, \ Q2 = \text{median} = 9.5, \ Q3 = 13, \ IQR = 13 - 4 = 9\]
(e) Create a box and whisker plot for the variable ``Sugar (g)” and mark the Q1, Q2, and Q3 quartiles, the median and any potential outliers