\({\bf (1)}\) Define the following terms:

\({\bf (2)}\) Explain the difference between a qualitative (categorical) and a quantitative variable

A qualitative variable is a non-numeric characteristic such as a name or label while a quantitative variable is numerical characteristic such as a measurement or count

\({\bf (3)}\) Explain the difference between a discrete and a continuous variable and give an example of each

A discrete variable is a quantitative variable that takes on only distinct whole number (integer) values such as counts whereas a continuous variable is a quantitative variable that can assume any value within a certain interval such temperature or height

\({\bf (4)}\) Explain the difference between a nominal and an ordinal variable and give an example of each

A nominal variable is a qualitative variable that has categories with no inherent ordering such as the names of a city or the brands or shoes. An ordinal variable is a qualitative variable that has an inherent ordering to its categories such as a Likert score or education level.

\({\bf (5)}\) At What age did women marry? A historian wants to estimate the average age at marriage of women in New England in the early 19th century. Within her state archives, she finds marriage records for the years \(1800 - 1820\), which she treats as a sample of all marriage records from the early 19th century. The average age of the women in the records is \(24.1\) years of age. Using the appropriate statistical method, she estimates that the average age of brides in the early \(19^{th}\)- century New England was between \(23.5\) and \(24.7\) years of age.

\({\bf (6)}\) Consider the following data obtained from flipping a coin 15 times. A value of \(H\) denotes the coin was heads whereas a value of \(T\) denotes the coin flip was tails.
coin.flip.result. H H H T H T T T H H T T T H T

Compute the frequency table for the variable ``coin flip result” and answer the following questions:

Result Frequency Relative Frequency
H 7 7/15 = 0.47
T 8 8/15 = 0.53

\({\bf (7)}\) A survey about color preferences reported the age distribution of the people who responded. Below are the results

Age Group (Years) 1-18 19-24 25-25 36-60 51-69 70 and over
Counts 10 97 70 36 14 5
Relative Frequency 0.04 0.42 0.30 0.16 0.06 0.02

Use this table to answer parts a - d

\({\bf (8)}\) Email spam is the curse of the internet. The table below gives a compilation of the most common types of spam
Type of Spam Percentage
Adult 14.5
Financial 16.2
Health 7.3
Leisure 7.8
Products 21.0
Scams 14.2

Use the table to answer the questions a and b

\({\bf (9)}\) A farmer in Idaho is interested in the number of rainy days in a given year so he records the number of rainy \(R\), cloudy \(C\) and sunny \(S\) days over two weeks in \(May\). His observations are \[\{R,R,C,C,C,S,S,C,R,R,C,R,S,S\}\]. Use the farmers observations to answer the following questions.

\({\bf (10)}\) Consider the following data from a survey conducted on college students in the state of Florida. As part of their research, surveyors recorded the high school performance (measured in grade point average - GPA) of 35 college students from across the state. For your convenience the data have been sorted from least to greatest:

\[2.0, 2.1, 2.3, 2.8, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0,\] \[3.0, 3.0, 3.0, 3.1, 3.2, 3.3, 3.3, 3.4, 3.4, 3.4,\] \[3.4, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6, 3.7, 3.7, 3.8,\] \[3.8, 3.8, 3.8, 4.0, 4.0\]

The following frequency table gives the distribution of the variable high school GPA (with values rounded to nearest 0.5). Fill in the table and answer the following questions:

GPA Frequency Relative frequency Cumulative RF
2.0 2 0.06 0.06
2.5 1 0.03 0.09
3.0 12 0.34 0.43
3.5 14 0.40 0.83
4.0 6 0.17 1.00

Using the frequency table on the rounded values the mode is the value with the highest relative frequency $ = 3.5$

(Using the raw data) the mode is the most frequent value observed in the data which is \(3.0\)

## [1] "Using the Rounded GPA scores:"
## 
##   The decimal point is 1 digit(s) to the left of the |
## 
##   20 | 00
##   22 | 
##   24 | 0
##   26 | 
##   28 | 
##   30 | 000000000000
##   32 | 
##   34 | 00000000000000
##   36 | 
##   38 | 
##   40 | 000000
## 
## NULL
## [1] "Using the Raw Data:"
## 
##   The decimal point is 1 digit(s) to the left of the |
## 
##   20 | 0
##   21 | 0
##   22 | 
##   23 | 0
##   24 | 
##   25 | 
##   26 | 
##   27 | 
##   28 | 0
##   29 | 
##   30 | 000000000
##   31 | 0
##   32 | 0
##   33 | 00
##   34 | 0000
##   35 | 0000
##   36 | 00
##   37 | 00
##   38 | 0000
##   39 | 
##   40 | 00
## 
## NULL

\({\bf (11)}\) Which statistic is more resistant to outliers, the mean or median? Why?

The median is resistant to outliers because it relies on only the middle value (or middle two values if \(n\) is even). Therefore, how far a value is from the center does not influence the median. The mean, however, is computed using values in a sample distribution. This causes the mean to be pulled in the direction of extreme values.

\({\bf (12)}\) Describe the shape of the following distributions and for each distribution identify if the mean will be larger, smaller or the same as the median.

(a) Skew right. The mean will be greater than median because the center of the data is pulled toward the right side of the distribution

(b)
Symmetric/Bell shaped. The mean and the median will be equal

(c)
Skew left. The mean will be less than median because the center of the data is pulled toward the left side of the distribution

\({\bf (13)}\) Consider the following four sets of observations of a quantitative variable \(x\). For your convenience the observations have been sorted in increasing order. Match datasets \(1-4\) with the correct histogram (labeled \(A - D\))

\[\text{Dataset 1} = \{0.1, 1.1, 2.6, 2.7, 3.4, 3.4, 4.1, 4.4, 8.8, 9.6\}\] \[\text{Dataset 2}= \{0.1, 0.3, 1.2, 2.4, 4.4, 4.5, 8.0, 8.9, 9.3, 9.3\}\] \[\text{Dataset 3} = \{1.1, 3.8, 5.3, 6.0, 6.2, 6.9, 7.9, 7.9, 8.1, 8.7\}\] \[\text{Dataset 4} = \{3.4, 4.5, 5.4, 5.6, 7.0, 8.5, 8.9, 9.2, 9.7, 9.7\}\]

\({\bf (14)}\) Consider the following set of 10 observations of a variable \(X\) sorted from least to greatest: \[3.3, 3.8, 4.0, 4.8, 4.8, 5.1, 5.2, 5.6, 5.7, 6.9\] Use the data to answer parts a-b

\({\bf (15)}\) Consider the following \(n = 20\) observations of the sugar and sodium content of several popular cereal brands and answer questions a - g:

Brand Sodium (mg) Sugar (g) Type
Frosted Mini Wheats 0 11 A
Raisin Bran 340 18 A
All Bran 70 5 A
Apple Jacks 140 14 C
Cap’n Crunch 200 12 C
Cheerios 180 1 C
Cinnamon Toast Crunch 210 10 C
Crackling Oat Bran 150 16 A
Fiber One 100 0 A
Frosted Flakes 130 12 C
Froot Loops 140 14 C
Honey Bunches of Oats 180 7 A
Honey Nut Cheerios 190 9 C
Life 160 6 C
Rice Krispies 290 3 C
Honey Smacks 50 15 A
Special K 220 4 A
Wheaties 180 4 A
Corn Flakes 200 3 A
Honeycomb 210 11 C
(A) Histogram of the variable sugar (g) using the bins in the frequency table (above)

  1. Histogram of the variable sugar (g) using the bins in the frequency table (above)

Frequency table used to construct histogram for the variable “Sugar (g)”
Sugar (g) Frequency RF(x) CRF(x)
< = 2 (g) 2 0.10 0.10
2 - 4 (g) 4 0.20 0.30
4 - 6 (g) 2 0.10 0.40
6 - 8 (g) 1 0.05 0.45
8 - 10 (g) 2 0.10 0.55
10 - 12 (g) 4 0.20 0.75
12 - 14 (g) 2 0.10 0.85
14 - 16 (g) 2 0.10 0.95
> 16 (g) 1 0.05 1.00


Boxplot of Sugar content in 20 cereal brands

Boxplot of Sugar content in 20 cereal brands

Cumulative Distribution of Sugar (g)

Cumulative Distribution of Sugar (g)