Definitions

Define the following terms/concepts


Confidence intervals and two-tailed tests:

\(\bf (1.)\) For parts (a) - (d) given the confidence interval for the parameter under the null hypothesis, use the point estimate to determine whether the null hypothesis \(H_0\) should be Rejected or Not Rejected and denote significance level \(\alpha\)

a.) \(H_0: \mu_0 = -2\)
\(H_A: \mu \neq \mu_0\)
\(95\%\) CI for \(\mu_0 \in [-3.0, -2.4]\)
\(\bar{x} = -2.7\) \[ \text{Decision} = \color{red}{\text{fail to reject} \ H_0} \\ \alpha = \color{red}{0.05} \]

b.) \(H_0: p_0 = 0.9\)
\(H_A: p \neq p_0\)
\(99\%\) CI for \(p_0 \in [0.972, 0.968]\)
\(\hat{p} = 0.97\)

\[ \text{Decision} = \color{red}{\text{fail to reject} \ H_0} \\ \alpha = \color{red}{0.01} \]

c.) \(H_0: \mu_0 = 0\)
\(H_A: \mu \neq \mu_0\)
\(90\%\) CI for \(\mu_0 \in [-3.63, 7.49]\)
\(\bar{x} = 1.93\)

\[ \text{Decision} = \color{red}{\text{fail to reject} \ H_0} \\ \alpha = \color{red}{0.1} \]

d.) \(H_0: \mu_0 = 50\)
\(H_A: \mu \neq \mu_0\)
\(85\%\) CI for \(\mu_0 \in [93.6, 139]\)
\(\bar{x} = 116.3\)

\[ \text{Decision} = \color{red}{\text{fail to reject} \ H_0} \\ \alpha = \color{red}{0.15} \]

The sign test



\(\bf (2.)\) A dermatologist is conducting an experiment to test whether a new topical skin care cream reduces the appearance facial of acne. From a sample of \(20\) patients with acne the researcher gives each patient a score from \(1-5\) where (5) representing the severity of their acne before the topical cream is applied. Patients are then asked to use the topical cream for \(10\) days. Following the \(10\) day treatment, patients are given a new acne score between \(1-5\). The data for this experiment is given in the table below:

Patient Number Score Before Treatment Score After Treatment Difference Sign
1 3 2 \(1\) \(+\)
2 2 1 \(1\) \(+\)
3 4 1 \(3\) \(+\)
4 4 1 \(3\) \(+\)
5 4 1 \(3\) \(+\)
6 5 2 \(3\) \(+\)
7 1 1 \(0\)
8 5 4 \(1\) \(+\)
9 5 1 \(4\) \(+\)
10 2 1 \(1\) \(+\)
11 2 2 \(0\)
12 5 1 \(4\) \(+\)
13 3 3 \(0\)
14 3 1 \(2\) \(+\)
15 1 1 \(0\)
16 2 1 \(1\) \(+\)
17 4 1 \(3\) \(+\)
18 5 1 \(4\) \(+\)
19 2 3 \(-1\) \(-\)
20 1 1 \(0\)

The dermatologist is particularly interested in whether the new product can improve the scars’ appearance. They plan to test the null hypothesis \(H_0: p_0 = 0.5\) using a right-tailed test with alternative hypothesis \(H_A: p>0.5\) using a sign test to determine if there is enough evidence to conclude that the topical cream a positive effect on reducing facial acne at the \(\alpha = 0.05\) level.

a.) Fill in the table above by computing the difference between in acne scores before and after applying the skin care cream. Record the sign of each difference and report the total number of positive signs \(s\)

b.) What does a “positive” sign indicate in this experiment? A positive sign indicates that the skin cream reduced the appearance of dry skin

c.) Compute the \(p\)-value for the sign test The number of positive signs is \(s = 14\). The \(p\)value is probability of observing \(14\) or more positive signs in \(15\) observations given by \[p\text{-value} = P(S\geq s | H_0) = \sum_{k = 14}^{15}\frac{20!}{k!(20 - k)!} \cdot 0.5^k \cdot (1-0.5)^{20-k} \] \[ \approx 0.0005 \]

d.) Use the \(p\)-value from part (c) to determine if there is enough evidence to conclude that the topical cream product has a positive effect on reducing the appearance of acne. Interpret your decision in context.

At the \(\alpha = 0.05\) significance level, we reject the null hypothesis and determine that there is sufficient evidence to support that the skin cream reduces dry skin



\(\bf (3.)\) In a boreal forest, researchers conducted an experiment to study how herbivores respond to variations in food plant quality and quantity at the stand level. They fertilized young forest stands and observed herbivore use over the subsequent year. The data collected included the number of animal tracks in in the fertilized and control plots (Ball, Danell, and Sunesson 2000)

Observation Number of Tracks In Fertilized plots Number of Tracks In Control Plots Difference Sign
1 15 10 \(5\) \(+\)
2 12 9 \(3\) \(+\)
3 18 11 \(7\) \(+\)
4 14 8 \(6\) \(+\)
5 16 12 \(4\) \(+\)
6 13 10 \(3\) \(+\)
7 17 11 \(6\) \(+\)
8 14 9 \(5\) \(+\)
9 19 13 \(6\) \(+\)
10 15 10 \(5\) \(+\)
11 11 8 \(3\) \(+\)
12 16 12 \(4\) \(+\)

The researchers are interested in whether or not fertilizing stands increased herbaceous activity. Conduct a sign test at the \(\alpha = 0.05\) significance level to determine if fertilized plots have significantly more herbivore tracks than control plots

a.) Fill in the table above by computing the difference in animal tracks between the fertilized and control plots. Record the sign of each difference and report the total number of positive signs \(s\)

b.) What does a “positive” sign indicate in this experiment? A positive sign indicates that the fertilized stands have more animal tracks

c.) Compute the \(p\)-value for the sign test The number of positive signs is \(s = 12\). The \(p\)value is probability of observing \(12\) or more positive signs in \(12\) observations given by \[p\text{-value} = P(S\geq s | H_0) = \frac{12!}{12!(12 - 12)!} \cdot 0.5^{12}\cdot(1-0.5)^{12-12} \] \[ 0.5^{12} \] \[ \approx 0 \]

d.) Use the \(p\)-value from part (c) to determine if there is enough evidence to conclude that fertilized stands have more herbivore activity. At the \(\alpha = 0.05\) significance level we reject the null hypothesis and conclude that fertilized stands have significantly more animal activity than non-fertilized stands



Comparing two population proportions \(p_1\) and \(p_2\)

\(\bf (4.)\) The United States Supreme Court serves as the judicial branch of the U.S. government, tasked with ensuring that laws conform to the U.S. Constitution. Although traditionally operating as a relatively discreet arbiter within the American government, recent years have seen an unusual spotlight on the Supreme Court in partisan politics. This heightened attention is largely attributed by political analysts to several landmark Supreme Court decisions and ethical controversies that have surfaced. Consequently, it is believed that this increased scrutiny has adversely affected public approval of the Court across the political spectrum. The table below provides a summary of current and historical polling results for assessing SCOTUS job approval. The current approval rating is summarized from \(192\) political polls conducted among the public since late 2022 (Ryan and Bycoffe 2024). The historic approval sentiment is characterized from \(39\) Gallup polls conducted since early 2001.

Period Parameter Sample Size Mean SCOTUS Approval Rating
Current Approval (2022 - 2024) \(p_1\) 192 41.16
Past Approval (Gallup: 2001 - 2022) \(p_2\) 39 50.87

Conduct a two-sample test at the \(\alpha = 0.05\) significance level to determine whether the current approval rating is significantly lower than the historical average - be sure to address all five steps of the hypothesis test.

  1. the pooled estimate of the common proportion is

\[\hat{p} = \frac{0.4116(192)+0.5087(39)}{192+39} = 0.428\]

\(n\hat{p} = 0.428(231) = 99\) and \(n(1-\hat{p}) = 0.572(231) = 132\) are both greater than the required \(15\) and thus the sample size is sufficient to meet the asymptotic requirement of the test

  1. The null hypothesis is \(H_0: p_1 - p_2 = 0\) and the alternative hypothesis is \(H_A: p_1 - p_2 < 0\)

  2. The pooled standard error is

\[SE_{D_{pooled}} = \sqrt{0.428(1-0.428)\left(\frac{1}{192}+\frac{1}{39}\right)} = 0.0869\]

the test statistic is

\[ Z_{obs} = \frac{(0.4116 - 0.5087)-0}{0.0869} = -1.11\]

  1. The \(p\)-value is given by

\[p\text{-value} = P(Z \leq -1.11| H_0) = 0.132\] Note that your \(p\)-value may be slightly different depending on how you rounded

  1. At the \(\alpha = 0.05\) significance level, we fail to reject the null hypothesis of no difference and conclude that the current SCOTUS approval rating is not significantly different than the historical average of about \(50.1\%\) approving



\(\bf (5.)\) A study published in the Journal of Paediatrics and Child Health was interested in the number of reports of child abuse in Sergipe, Brazil, before and after COVID-19 started. The researchers hypothesized that children in abusive homes might be in more danger because they weren’t going to school and their families were under more stress (Martins-Filho et al. 2020). A summary of the study is reported below:

Period Parameter Total Registries Registries for Childer under 12 Years of Age Relative Proportion
Before COVID-19 (2019) \(p_1\) 70 18 0.26
After COVID-19 (2020) \(p_2\) 53 16 0.30

Conduct a two-sample test at the \(\alpha = 0.01\) significance level to determine whether the rate of reported child abuse for children under the age of \(12\) was significantly different after the start of the COVID-19 pandemic - be sure to address all five steps of the hypothesis test

  1. the pooled estimate of the common proportion is

\[\hat{p} = \frac{18+16}{70+53} = 0.276\]

\(n\hat{p} = 0.276(123) = 34\) and \(n(1-\hat{p}) = 0.723(123) = 89\) are both greater than the required \(15\) and thus the sample size is sufficient to meet the asymptotic requirement of the test

  1. The null hypothesis is \(H_0: p_1 - p_2 = 0\) and the alternative hypothesis is \(H_A: p_1 - p_2 \neq 0\)

  2. The pooled standard error is

\[SE_{D_{pooled}} = \sqrt{0.276(1-0.276)\left(\frac{1}{70}+\frac{1}{53}\right)} = 0.0819\]

the test statistic is

\[ Z_{obs} = \frac{(0.26 - 0.30)-0}{0.0819} = -0.488\]

  1. The \(p\)-value is given by

\[ p\text{-value} = P(|Z|\geq|-0.488||H_0)\] \[ = 2\times \left[1 - P(Z < 0.488|H_0)\right] = 0.625\]

Note that your \(p\)-value may be slightly different depending on how you rounded

  1. At the \(\alpha = 0.01\) significance level we fail to reject the null hypothesis and conclude that the proportion of reported child abuse cases for children under 12 years of before and after the start of the COVID-19 pandemic are not significantly different


Comparing two population proportions \(\mu_1\) and \(\mu_2\)

\(\bf (6.)\) A study published in the journal Physical Culture and Sport. Studies and Research explored what motivates athletes to keep playing sports, comparing team sports like football with individual ones like taekwondo (Moradi, Bahrami, and Dana 2020). The study used stratified random sampling to survey \(265\) athletes from four team disciplines (football, volleyball, basketball, handball) and 2 individual disciplines (kung fu and taekwondo). The study evaluated motivational factors affecting sport participation based on eight fields: achievement/status, teamwork, fitness, energy release, situational factors, skill development, friendship, and fun. Each field was evaluated based on a survey that quantified sentiment in each field using three-level likert scale responses (very important=3, somewhat important=2, and not important=1). The results of the study are summarized in the table below:

Motivational Factor Sport Type Sample Mean Sample Stdev. Sample Size
Participation (total) Team 75.67 9.38 203
Individual 80.50 8.40 62
Achievement Team 15.03 2.71 203
Individual 15.72 3.03 62
Teamwork Team 8.11 1.21 203
Individual 7.68 1.23 62
Energy Release Team 11.88 2.05 203
Individual 12.85 2.07 62
Fitness Team 7.79 1.27 203
Individual 8.40 0.94 62
Situational Factors Team 7.48 1.44 203
Individual 8.33 0.98 62
Skill Development Team 7.94 1.28 203
Individual 8.42 1.07 62
Friendship Team 9.57 1.78 203
Individual 10.50 1.51 62
Fun Team 7.74 1.20 203
Individual 8.18 1.04 62

Conduct a pooled (assume the sample vairances are equal) two independent samples \(t\)-test at the \(\alpha = 0.05\) significance level to determine whether fitness as a motivating factor is significantly different between team and individual sports.

  1. The null hypothesis is \(H_0: \mu_1 - \mu_2 = 0\) where \(\mu_1\) is the mean score for fitness for team sports and \(\mu_2\) is the mean score for fitness for individual sports. The alternative hypothesis is \(H_A: \mu_1 - \mu_2 \neq 0\)

  2. The pooled estimate of the sample variance is

\[s_{pooled} = \sqrt{\frac{(203 - 1)1.27^2 + (62-1)0.94^2}{203+62 - 2}} = 1.201 \]

the estiamted standard error assuming the samples are independent have the same variance is given by

\[ SE(\hat{\mu}_d) = 1.201\cdot\sqrt{\frac{1}{203}+\frac{1}{62}} = 0.174\]

The test statistic is

\[ t_{obs} = \frac{(7.79 - 8.40) - 0}{0.174} = -3.505\]

where \(t_{obs}\) is approximately \(t\)-distributed with \(203+62 - 2 = 263\) degrees of freedom

  1. The \(p\)-value is

\[ p\text{-value} = P(|t|\geq |t_{obs}||H_0) \] \[= 2\left[P(t \geq 3.505 | H_0)\right] = 0.00054\]

  1. At the \(\alpha = 0.05\) significance level, we reject the null hypothesis and conclude that fitness as motivator is significantly different for athletes in team sports vs individual sports.



\(\bf (7.)\) A 1993 study published in the Canadian Journal of Zoology investigated the characteristics of maturation and growth in two lowland populations (Servotte and Thevenon) of European Common Frogs Rana temporaria (Augert and Joly 1993) in Southeast France from 1986 - 1989. A summary of the body lengths for male and female frogs from 1 - 4 years in age in both populations are summarized below.

Age (Years) Population Location Male Body Length (mm) Stdev Sample size Female Body Length (mm) Stdev Sample Size
1 Servotte 32.5 5.3 45 33.4 5.4 76
2 Servotte 55.8 4.7 11 59.7 6.2 15
3 Servotte 61.2 5.3 35 65.6 5.2 27
4 Servotte 64.7 7.3 12 69.6 0.5 15
1 Thevenon 32.9 3.9 49 32.8 4.3 76
2 Thevenon 51.8 3.2 26 53.3 6.0 29
3 Thevenon 59.9 5.4 27 59.9 6.4 21
4 Thevenon 61.0 4.1 12 63.0 4.3 13

Conduct a two independent samples \(t\)-test at the \(\alpha = 0.05\) significance level to evaluate whether there is a statistically significant difference in the mean body lengths of four-year-old male frogs between the Servotte and Thévenon populations. Assume the sample variances are NOT equal.

  1. The null hypothesis is \(H_0: \mu_1 - \mu_2 = 0\) where \(\mu_1\) is the mean male body length for frogs in the Servotte popultion and \(\mu_2\) is the mean male body length for frogs in the Thevenon population. The alternative hypothesis is \(H_A: \mu_1 - \mu_2 \neq 0\)

  2. The estimated standard error assuming the samples are independent and do not have the same variance is given by

\[ SE(\hat{\mu}_d) = \sqrt{\frac{7.3^2}{12}+\frac{4.1^2}{12}} = 2.417\]

The test statistic is

\[ t_{obs} = \frac{(64.7 - 61.0)}{2.417} = 1.531\]

where \(t_{obs}\) is approximately \(t\)-distributed with \(11\) degrees of freedom

  1. The \(p\)-value is

\[ p\text{-value} = P(|t|\geq |t_{obs}||H_0) \] \[= 2\left[P(t \geq 1.531 | H_0)\right] = 0.077\]

  1. At the \(\alpha = 0.05\) significance level, we fail to reject the null hypothesis and conclude that body lengths of four year old male frogs is not significantly different between the two populations.

References

Augert, Dominique, and Pierre Joly. 1993. “Plasticity of Age at Maturity Between Two Neighbouring Populations of the Common Frog (Rana Temporaria l.).” Canadian Journal of Zoology 71 (1): 26–33.
Ball, John P, Kjell Danell, and Peter Sunesson. 2000. “Response of a Herbivore Community to Increased Food Quality and Quantity: An Experiment with Nitrogen Fertilizer in a Boreal Forest.” Journal of Applied Ecology 37 (2): 247–55.
Martins-Filho, Paulo R, Nicole P Damascena, Renata CM Lage, and Karyna B Sposato. 2020. “Decrease in Child Abuse Notifications During COVID-19 Outbreak: A Reason for Worry or Celebration?” Journal of Paediatrics and Child Health 56 (12): 1980.
Moradi, Jalil, Alireza Bahrami, and Amir Dana. 2020. “Motivation for Participation in Sports Based on Athletes in Team and Individual Sports.” Physical Culture and Sport. Studies and Research 85 (1): 14–21.
Ryan, Best, and Aaron Bycoffe. 2024. “National : President: General Election : 2024 Polls.” FiveThirtyEight. https://projects.fivethirtyeight.com/polls/president-general/2024/national/.