\(\bf (1)\) Rocket League is a popular online video game and E-sport that emerged in 2015. The game enjoys a healthy following of around 93 million players per month. The game features players from around the world who compete in sports like soccer (football), basketball and hockey while controlling RC-like vehicles. The most popular game-mode is doubles soccer in which teams of two players each try to outscore each other by hitting a soccer ball into the opposing goal. In the course of each match, the player controlled vehicles often collide in what is commonly referred to as a “bump”. The following histogram shows the distribution of the average number of “bumps” per match for a given player of Rocket League

  1. Assuming a mean of \(\bar{x} = 355\) and standard deviation of \(s = 69\), in approximately what proportion of matches did the player have more than 423 bumps?

\[ z = \frac{423 - 355}{69} \approx 1\]. Therefore we want to know what proportion of observations have a value \(\geq \bar{x}+1s\) which is by the empirical rule is \(13.5\%+2.5\% = 16\%\)

  1. Approximately what proportion of matches did the player have more than 217 bumps but fewer than 493 bumps?

approximately \(95\%\)

  1. How many bumps would the player need to have in a match to be in the top \(2.5\%\) of matches?

at least \(355+2s = 493\) bumps

  1. After playing a new match, the player logs 620 bumps. Calculate the \(z\)-score for this match and explain the meaning of the z-score in relation to the average number of bumps. How many standard deviations is the player’s performance from the mean number of bumps?

\[ z = \frac{620 - 355}{69} = 3.84 \ \text{standard deviations}\]

Figure 1 Pinneped whisker morphology - Ginter et al. 2012 A juvenile Harp Seal

\(\bf (2)\) Pinnipeds (seals and sea lions) possess the largest vibrissae (whiskers) among mammals and their vibrissal hair shafts demonstrate a diversity of shapes (See Figure 1). In a study conducted by (Ginter et al. 2012), researchers measured 9 characteristics of Pinniped whiskers in individuals from 9 species of seals and sea lion. Their goal was to better characterize whisker morphology and evolution. The following data are the recorded whisker lengths of 20 Harp Seals (given in Table 1). Note that the data have been sorted according to increasing ``Total Length (cm)” for your convenience - use the table to answer the following questions:

Whisker Length data for \(n = 20\) Harp Seals
Observation Whisker Total Length (cm) Number of Beads
1 4.18 2
2 4.23 2
3 4.31 2
4 4.31 2
5 4.33 2
6 4.34 2
7 4.35 2
8 4.38 2
9 4.40 2
10 4.41 2
11 4.43 3
12 4.44 2
13 4.46 2
14 4.47 3
15 4.48 3
16 4.51 2
17 4.59 3
18 4.60 2
19 4.62 2
20 5.08 3
## [1] "comma separated values = 4.18,4.23,4.31,4.31,4.33,4.34,4.35,4.38,4.4,4.41,4.43,4.44,4.46,4.47,4.48,4.51,4.59,4.6,4.62,5.08"
  1. Compute the quartiles, interquartile range, mean, and standard deviation for the variable ``Total Length (cm)” - (use the course website apps for help)

\[\begin{eqnarray} Q1 = \frac{4.33+4.34}{2} = 4.335 \approx 4.34 \\\nonumber median = \frac{4.41+4.43}{2} = 4.42 \\\nonumber Q3 = \frac{4.48+4.51}{2} = 4.49 \\\nonumber IQR = 4.49 - 4.34 = 0.15 \\\nonumber \bar{x} = \sum_{i = 1}^{20} \frac{4.18 + 4.23 + \cdots 5.08}{20} = 4.446 \\\nonumber s = \sqrt{\sum_{i = 1}^{20}\frac{(4.18 - 4.45)^2 + (4.23 - 4.45)^2 + \cdots (5.08 - 4.45)^2}{19}} = 0.189 \\\nonumber \end{eqnarray}\]

  1. Make a boxplot of the variable ``Total Length” and make note of any outliers

  1. Observation 20 has a total whisker length of 5.08 (cm), how many standard deviations is this observation from the mean whisker length? (hint compute the \(z\)-score for this observation)

The sample mean is \(\bar{x} = 4.446\) and the sample standard deviation is \(s = 0.189\) \[ z = \frac{5.08 - \bar{x}}{s} = \frac{5.080 - 4.446}{0.189} = 3.372 \]

\(\bf (3)\) Use the following pair of boxplots constructed from the Pinneped whisker data in \(\textbf{Table 1}\). The boxplots show the distribution of total whisker length (cm) for whiskers with 2 beads vs whiskers with three beads. Answer the following questions:

## [1] "-----------------2 bead whisker lengths-----------------"
## [1] "comma separated values =  4.3,4.6,4.5,4.6,4.2,4.3,4.3,4.2,4.4,4.3,4.5,4.3,4.4,4.4,4.4"
## [1] "-----------------3 bead whisker lengths-----------------"
## [1] "comma separated values =  4.6,5.1,4.5,4.4,4.5"
  1. Using the boxplot above, do the whisker lengths for 2 bead and 3 bead whiskers follow a normal distribution? why or why not?

The distribution for 2 beads is normally distributed (given by the symmetry of boxplot) but the distribution for 3 beads is skewed

  1. Suppose researchers observe two new Harp Seals (Denoted Seal 1 and Seal 2) and record their total whiskers lengths and the number of beads on the whiskers. Seal 1 has a total whisker length of 4.32 (cm) and the whisker has 3 beads. Seal 2 has a total whisker length of 4.34 (cm) but the whisker has only 2 beads. Suppose we want to know how they compare relative to the distribution of whiskers with the same number of beads. Confirm that the whisker length of seal 1 has a \(z\)-score of \(-1.07\) while Seal 2 has a \(z\)-score of \(-0.43\). Refer to the boxplot above

The mean length for whiskers with two beads and three beads is \(\bar{x}_{2beads} = 4.32\) and \(\bar{x}_{3beads} = 4.61\) respectively. The standard deviation in length for whiskers with 2 and 3 beads is \(s_{2beads} = 0.121\) and \(s_{3beads} = 0.270\) respectively. Seal 1 has a \(z\)-score of \[z_1 = \frac{4.32 - \bar{x_3}}{s_3} = \frac{4.32 - 4.61}{0.27} = -1.07\] And Seal 2 has a \(z\)-score of \[z_2 = \frac{4.34 - \bar{x_2}}{s_2} = \frac{4.34 - 4.39}{0.12} = -0.43\]

  1. If the researchers were to observe the whisker length of a new seal and the whisker had 3 beads, how long or short would the whisker of this observation have to be for it to be considered an outlier? (assume the data is approximately normal)

Since the data is approximately symmetric (normal) we can use the 2 standard deviations rule \(\bar{x}-2s > x > \bar{x}+2s\) - the whisker would have to be greater than \[ \bar{x}_3 + 2s = 4.608 + 2\times 0.270 = 5.148 (cm) \] or less than \[ \bar{x}_3 - 2s = 4.608 - 2\times 0.270 = 4.068 (cm)\]

\(\bf (4)\) Use the following plot of the cumulative distribution of a quantitative variable \(x\) to answer questions (a)-(c)

## [1] "Comma separated values =  -4.7,-2.5,-2.3,-1.3,-1.3,-1.2,0,0,1,1.8,2.1,2.3,2.6,2.8,2.8,2.9,3.5,3.6,3.7,3.7,4.3,4.8,4.9,5.2,5.3,5.5,6,6.5,6.6,7.1,7.1,7.2,7.4,7.7,7.8,8.1,8.1,8.1,8.5,8.7,9,9,10,10.2,10.2,11,11,11.1,11.1,11.5"
  1. Assume that the mean and standard deviation of \(X\) are \(\bar{x} = 5\) and \(s = 4\). Use these values to find the \(2.5th\) and \(97.5th\) percentiles of \(X\). Compare your answer with the cumulative distribution plot above, why might the two answers differ? (hint: think about shape and how that relates to the empirical rule)

approximately \(-3\) and \(13\). Using the plot above the answers are approximately -2.5 and 11.1. The percentiles differ because the distribution of \(X\) is skewed to the left. The percentiles given by the empirical rule are only approximate for distributions with deviations from normality/span>

  1. Assume the above cumulative distribution represents an approximately normal distribution (symmetric, bell-shaped) with a standard deviation of \(s = 4\) and a mean of \(\bar{x} = 5\), Under the empirical rule, what percentage of the observations will have a value between \(-3\) and \(13\)?

if the mean is \(\bar{x} = 5\) then both \(-3\) and \(13\) are two standard deviations from the mean e.g \(-3 = \bar{x}-2s\) and \(13 = \bar{x}+2s\). Under the empirical rule, appoximately \(95\%\) of the observations will fall in this interval

  1. Using the mean and standard deviation from part a-b, compute and interpret the z-scores for the observations \(x_1 = 11.5\) and \(x_2 = -4.7\). Would these two observations be considered outliers under the \(\pm 2s\) rule?

\[ z_1 = \frac{11.5 - 5}{4} \approx 1.63 \] No, because \(1.63 < 2\) \[ z_2 = \frac{-4.7 - 5}{4} \approx -2.43 \] Yes, because \(-2.43 < -2\)

\(\bf (5)\) Define the following terms

  1. Explanatory variable - The independent variable which we manipulate to see how the response variable changes

  2. Response variable - The outcome or dependent variable - the variable on which we make comparisons for different values of the explanatory variable

  3. experimental study - A study which assigns subjects/observations to one or more treatments and observes the outcome of the response variable

  4. observational study - A nonexperimental study in which the researcher observes values of the response and explanatory variables for each subject/observation

  5. survey - A type of nonexperimental study in which subjects/observations are selected from a population for the purpose of making inferences about the population.

\(\bf (6)\) Name the sampling method used in each of the following situations:

  1. A man is standing outside of a grocery store handing out questionnaires to shoppers asking them to evaluate their shopping experience and the quality of service. He does not ask shoppers who have their hands full carrying groceries, but instead asks all shoppers who have only a few items or have both hands free to fill out the questionnaire.

convienence sampling

  1. A gardener has garden box in which he has planted 10 rows of corn. Each row consists of eight corn plants. The gardener wishes to know the average height of the corn plants in the garden box so he randomly selects rows 1, 5, and 9 and measures the height of all corn plants in those rows.

cluster sampling

  1. A company wishes to know how employees feel about compensation and benefits. Over the next few weeks, the company asks 30 employees from each of its branch locations to fill out a survey regarding compensation.

stratified random sampling

  1. A local polling company wishes to know what proportion of citizens in Moscow, Idaho are voting for the republican candidate in mayoral race. The polling company records this information for every 5th voter who leaves the voting booth.

systematic sampling

  1. A lumber company wishes to estimate the amount of usable lumber on a plot of land. The company randomly selects 100 trees and measures their diameter, height, and volume, in order to estimate the number of board feet of each tree.

simple random sampling

References

Ginter, Carly C, Thomas J DeWitt, Frank E Fish, and Christopher D Marshall. 2012. “Fused Traditional and Geometric Morphometrics Demonstrate Pinniped Whisker Diversity.” PloS One 7 (4): e34481.