Definitions

Define the following terms/concepts

a.) Non-parametric statistics - A branch of statistics which does not assume that the sample data comes from a specific probability distribution.
b.) Parametric statistics - A branch of statistics which assumes the sample data comes from a specific probability distribution that depends on a set of fixed parameters.
c.) Describe the advantages and disadvantages of Non-parametric significance tests

Advantages
1. They may be the only alternative when sample sizes are very small (unless the population distribution is known exactly, but this is almost never the case)
2. They make few assumptions about the population distribution of the data
3. They are advantages when the data represent crude measurements such as subjective ratings/rankings (e.g, Likert responses)
4. They often have simpler computations and interpretations than parametric tests
Disadvantages
1. They are generally less powerful than their parametric analogues

d.) Explanatory Variable -In bivariate statistical procedures, the explanatory variable is the independent variable that usually defines the groups that will be compared across the response variable
e.) Response Variable -The dependent variable that measures the outcome of interest from each group

Confidence intervals and two-tailed tests:

\(\bf (1.)\) For parts (a) - (d) given the confidence interval for the parameter under the null hypothesis, use the point estimate to determine whether the null hypothesis \(H_0\) should be Rejected or Not Rejected and denote significance level \(\alpha\)

a.) \(H_0: \mu_0 = -2\)
\(H_A: \mu \neq \mu_0\)
\(95\%\) CI for \(\mu_0 \in [-3.0, -2.4]\)
\(\bar{x} = -2.7\) \[ \text{Decision} = \color{red}{\text{fail to reject} \ H_0} \\ \alpha = \color{red}{0.05} \]

b.) \(H_0: p_0 = 0.9\)
\(H_A: p \neq p_0\)
\(99\%\) CI for \(p_0 \in [0.972, 0.968]\)
\(\hat{p} = 0.97\)

\[ \text{Decision} = \color{red}{\text{fail to reject} \ H_0} \\ \alpha = \color{red}{0.01} \]

c.) \(H_0: \mu_0 = 0\)
\(H_A: \mu \neq \mu_0\)
\(90\%\) CI for \(\mu_0 \in [-3.63, 7.49]\)
\(\bar{x} = 1.93\)

\[ \text{Decision} = \color{red}{\text{fail to reject} \ H_0} \\ \alpha = \color{red}{0.1} \]

d.) \(H_0: \mu_0 = 50\)
\(H_A: \mu \neq \mu_0\)
\(85\%\) CI for \(\mu_0 \in [93.6, 139]\)
\(\bar{x} = 116.3\)

\[ \text{Decision} = \color{red}{\text{fail to reject} \ H_0} \\ \alpha = \color{red}{0.15} \]

The sign test

\(\bf (2.)\) A dermatologist is conducting an experiment to test whether a new topical skin care cream reduces the appearance facial of acne. From a sample of \(20\) patients with acne the researcher gives each patient a score from \(1-5\) where (5) representing the severity of their acne before the topical cream is applied. Patients are then asked to use the topical cream for \(10\) days. Following the \(10\) day treatment, patients are given a new acne score between \(1-5\). The data for this experiment is given in the table below:

Patient Number	Score Before Treatment	Score After Treatment	Difference	Sign
1	3	2	\(1\)	\(+\)
2	2	1	\(1\)	\(+\)
3	4	1	\(3\)	\(+\)
4	4	1	\(3\)	\(+\)
5	4	1	\(3\)	\(+\)
6	5	2	\(3\)	\(+\)
7	1	1	\(0\)
8	5	4	\(1\)	\(+\)
9	5	1	\(4\)	\(+\)
10	2	1	\(1\)	\(+\)
11	2	2	\(0\)
12	5	1	\(4\)	\(+\)
13	3	3	\(0\)
14	3	1	\(2\)	\(+\)
15	1	1	\(0\)
16	2	1	\(1\)	\(+\)
17	4	1	\(3\)	\(+\)
18	5	1	\(4\)	\(+\)
19	2	3	\(-1\)	\(-\)
20	1	1	\(0\)

The dermatologist is particularly interested in whether the new product can improve the scars’ appearance. They plan to test the null hypothesis \(H_0: p_0 = 0.5\) using a right-tailed test with alternative hypothesis \(H_A: p>0.5\) using a sign test to determine if there is enough evidence to conclude that the topical cream a positive effect on reducing facial acne at the \(\alpha = 0.05\) level.

a.) Fill in the table above by computing the difference between in acne scores before and after applying the skin care cream. Record the sign of each difference and report the total number of positive signs \(s\)

b.) What does a “positive” sign indicate in this experiment? A positive sign indicates that the skin cream reduced the appearance of dry skin

c.) Compute the \(p\)-value for the sign test The number of positive signs is \(s = 14\). The \(p\)value is probability of observing \(14\) or more positive signs in \(15\) observations given by \[p\text{-value} = P(S\geq s | H_0) = \sum_{k = 14}^{15}\frac{20!}{k!(20 - k)!} \cdot 0.5^k \cdot (1-0.5)^{20-k} \] \[ \approx 0.0005 \]

d.) Use the \(p\)-value from part (c) to determine if there is enough evidence to conclude that the topical cream product has a positive effect on reducing the appearance of acne. Interpret your decision in context.

At the \(\alpha = 0.05\) significance level, we reject the null hypothesis and determine that there is sufficient evidence to support that the skin cream reduces dry skin

\(\bf (3.)\) In a boreal forest, researchers conducted an experiment to study how herbivores respond to variations in food plant quality and quantity at the stand level. They fertilized young forest stands and observed herbivore use over the subsequent year. The data collected included the number of animal tracks in in the fertilized and control plots (Ball, Danell, and Sunesson 2000)

Observation	Number of Tracks In Fertilized plots	Number of Tracks In Control Plots	Difference	Sign
1	15	10	\(5\)	\(+\)
2	12	9	\(3\)	\(+\)
3	18	11	\(7\)	\(+\)
4	14	8	\(6\)	\(+\)
5	16	12	\(4\)	\(+\)
6	13	10	\(3\)	\(+\)
7	17	11	\(6\)	\(+\)
8	14	9	\(5\)	\(+\)
9	19	13	\(6\)	\(+\)
10	15	10	\(5\)	\(+\)
11	11	8	\(3\)	\(+\)
12	16	12	\(4\)	\(+\)

The researchers are interested in whether or not fertilizing stands increased herbaceous activity. Conduct a sign test at the \(\alpha = 0.05\) significance level to determine if fertilized plots have significantly more herbivore tracks than control plots

a.) Fill in the table above by computing the difference in animal tracks between the fertilized and control plots. Record the sign of each difference and report the total number of positive signs \(s\)

b.) What does a “positive” sign indicate in this experiment? A positive sign indicates that the fertilized stands have more animal tracks

c.) Compute the \(p\)-value for the sign test The number of positive signs is \(s = 12\). The \(p\)value is probability of observing \(12\) or more positive signs in \(12\) observations given by \[p\text{-value} = P(S\geq s | H_0) = \frac{12!}{12!(12 - 12)!} \cdot 0.5^{12}\cdot(1-0.5)^{12-12} \] \[ 0.5^{12} \] \[ \approx 0 \]

d.) Use the \(p\)-value from part (c) to determine if there is enough evidence to conclude that fertilized stands have more herbivore activity. At the \(\alpha = 0.05\) significance level we reject the null hypothesis and conclude that fertilized stands have significantly more animal activity than non-fertilized stands

Comparing two population proportions \(p_1\) and \(p_2\)

\(\bf (4.)\) The United States Supreme Court serves as the judicial branch of the U.S. government, tasked with ensuring that laws conform to the U.S. Constitution. Although traditionally operating as a relatively discreet arbiter within the American government, recent years have seen an unusual spotlight on the Supreme Court in partisan politics. This heightened attention is largely attributed by political analysts to several landmark Supreme Court decisions and ethical controversies that have surfaced. Consequently, it is believed that this increased scrutiny has adversely affected public approval of the Court across the political spectrum. The table below provides a summary of current and historical polling results for assessing SCOTUS job approval. The current approval rating is summarized from \(192\) political polls conducted among the public since late 2022 (Ryan and Bycoffe 2024). The historic approval sentiment is characterized from \(39\) Gallup polls conducted since early 2001.

Period	Parameter	Sample Size	Mean SCOTUS Approval Rating
Current Approval (2022 - 2024)	\(p_1\)	192	41.16
Past Approval (Gallup: 2001 - 2022)	\(p_2\)	39	50.87

Conduct a two-sample test at the \(\alpha = 0.05\) significance level to determine whether the current approval rating is significantly lower than the historical average - be sure to address all five steps of the hypothesis test.

the pooled estimate of the common proportion is

\[\hat{p} = \frac{0.4116(192)+0.5087(39)}{192+39} = 0.428\]

\(n\hat{p} = 0.428(231) = 99\) and \(n(1-\hat{p}) = 0.572(231) = 132\) are both greater than the required \(15\) and thus the sample size is sufficient to meet the asymptotic requirement of the test

The null hypothesis is \(H_0: p_1 - p_2 = 0\) and the alternative hypothesis is \(H_A: p_1 - p_2 < 0\)
The pooled standard error is

\[SE_{D_{pooled}} = \sqrt{0.428(1-0.428)\left(\frac{1}{192}+\frac{1}{39}\right)} = 0.0869\]

the test statistic is

\[ Z_{obs} = \frac{(0.4116 - 0.5087)-0}{0.0869} = -1.11\]

The \(p\)-value is given by

\[p\text{-value} = P(Z \leq -1.11| H_0) = 0.132\] Note that your \(p\)-value may be slightly different depending on how you rounded

At the \(\alpha = 0.05\) significance level, we fail to reject the null hypothesis of no difference and conclude that the current SCOTUS approval rating is not significantly different than the historical average of about \(50.1\%\) approving

\(\bf (5.)\) A study published in the Journal of Paediatrics and Child Health was interested in the number of reports of child abuse in Sergipe, Brazil, before and after COVID-19 started. The researchers hypothesized that children in abusive homes might be in more danger because they weren’t going to school and their families were under more stress (Martins-Filho et al. 2020). A summary of the study is reported below:

Period	Parameter	Total Registries	Registries for Childer under 12 Years of Age	Relative Proportion
Before COVID-19 (2019)	\(p_1\)	70	18	0.26
After COVID-19 (2020)	\(p_2\)	53	16	0.30

Conduct a two-sample test at the \(\alpha = 0.01\) significance level to determine whether the rate of reported child abuse for children under the age of \(12\) was significantly different after the start of the COVID-19 pandemic - be sure to address all five steps of the hypothesis test

the pooled estimate of the common proportion is

\[\hat{p} = \frac{18+16}{70+53} = 0.276\]

\(n\hat{p} = 0.276(123) = 34\) and \(n(1-\hat{p}) = 0.723(123) = 89\) are both greater than the required \(15\) and thus the sample size is sufficient to meet the asymptotic requirement of the test

The null hypothesis is \(H_0: p_1 - p_2 = 0\) and the alternative hypothesis is \(H_A: p_1 - p_2 \neq 0\)
The pooled standard error is

\[SE_{D_{pooled}} = \sqrt{0.276(1-0.276)\left(\frac{1}{70}+\frac{1}{53}\right)} = 0.0819\]

the test statistic is

\[ Z_{obs} = \frac{(0.26 - 0.30)-0}{0.0819} = -0.488\]

The \(p\)-value is given by

\[ p\text{-value} = P(|Z|\geq|-0.488||H_0)\] \[ = 2\times \left[1 - P(Z < 0.488|H_0)\right] = 0.625\]

Note that your \(p\)-value may be slightly different depending on how you rounded

At the \(\alpha = 0.01\) significance level we fail to reject the null hypothesis and conclude that the proportion of reported child abuse cases for children under 12 years of before and after the start of the COVID-19 pandemic are not significantly different

Comparing two population proportions \(\mu_1\) and \(\mu_2\)

\(\bf (6.)\) A study published in the journal Physical Culture and Sport. Studies and Research explored what motivates athletes to keep playing sports, comparing team sports like football with individual ones like taekwondo (Moradi, Bahrami, and Dana 2020). The study used stratified random sampling to survey \(265\) athletes from four team disciplines (football, volleyball, basketball, handball) and 2 individual disciplines (kung fu and taekwondo). The study evaluated motivational factors affecting sport participation based on eight fields: achievement/status, teamwork, fitness, energy release, situational factors, skill development, friendship, and fun. Each field was evaluated based on a survey that quantified sentiment in each field using three-level likert scale responses (very important=3, somewhat important=2, and not important=1). The results of the study are summarized in the table below:

Motivational Factor	Sport Type	Sample Mean	Sample Stdev.	Sample Size
Participation (total)	Team	75.67	9.38	203
	Individual	80.50	8.40	62
Achievement	Team	15.03	2.71	203
	Individual	15.72	3.03	62
Teamwork	Team	8.11	1.21	203
	Individual	7.68	1.23	62
Energy Release	Team	11.88	2.05	203
	Individual	12.85	2.07	62
Fitness	Team	7.79	1.27	203
	Individual	8.40	0.94	62
Situational Factors	Team	7.48	1.44	203
	Individual	8.33	0.98	62
Skill Development	Team	7.94	1.28	203
	Individual	8.42	1.07	62
Friendship	Team	9.57	1.78	203
	Individual	10.50	1.51	62
Fun	Team	7.74	1.20	203
	Individual	8.18	1.04	62

Conduct a pooled (assume the sample vairances are equal) two independent samples \(t\)-test at the \(\alpha = 0.05\) significance level to determine whether fitness as a motivating factor is significantly different between team and individual sports.

The null hypothesis is \(H_0: \mu_1 - \mu_2 = 0\) where \(\mu_1\) is the mean score for fitness for team sports and \(\mu_2\) is the mean score for fitness for individual sports. The alternative hypothesis is \(H_A: \mu_1 - \mu_2 \neq 0\)
The pooled estimate of the sample variance is

\[s_{pooled} = \sqrt{\frac{(203 - 1)1.27^2 + (62-1)0.94^2}{203+62 - 2}} = 1.201 \]

the estiamted standard error assuming the samples are independent have the same variance is given by

\[ SE(\hat{\mu}_d) = 1.201\cdot\sqrt{\frac{1}{203}+\frac{1}{62}} = 0.174\]

The test statistic is

\[ t_{obs} = \frac{(7.79 - 8.40) - 0}{0.174} = -3.505\]

where \(t_{obs}\) is approximately \(t\)-distributed with \(203+62 - 2 = 263\) degrees of freedom

The \(p\)-value is

\[ p\text{-value} = P(|t|\geq |t_{obs}||H_0) \] \[= 2\left[P(t \geq 3.505 | H_0)\right] = 0.00054\]

At the \(\alpha = 0.05\) significance level, we reject the null hypothesis and conclude that fitness as motivator is significantly different for athletes in team sports vs individual sports.

\(\bf (7.)\) A 1993 study published in the Canadian Journal of Zoology investigated the characteristics of maturation and growth in two lowland populations (Servotte and Thevenon) of European Common Frogs Rana temporaria (Augert and Joly 1993) in Southeast France from 1986 - 1989. A summary of the body lengths for male and female frogs from 1 - 4 years in age in both populations are summarized below.

Age (Years)	Population Location	Male Body Length (mm)	Stdev	Sample size	Female Body Length (mm)	Stdev	Sample Size
1	Servotte	32.5	5.3	45	33.4	5.4	76
2	Servotte	55.8	4.7	11	59.7	6.2	15
3	Servotte	61.2	5.3	35	65.6	5.2	27
4	Servotte	64.7	7.3	12	69.6	0.5	15
1	Thevenon	32.9	3.9	49	32.8	4.3	76
2	Thevenon	51.8	3.2	26	53.3	6.0	29
3	Thevenon	59.9	5.4	27	59.9	6.4	21
4	Thevenon	61.0	4.1	12	63.0	4.3	13

Conduct a two independent samples \(t\)-test at the \(\alpha = 0.05\) significance level to evaluate whether there is a statistically significant difference in the mean body lengths of four-year-old male frogs between the Servotte and Thévenon populations. Assume the sample variances are NOT equal.

The null hypothesis is \(H_0: \mu_1 - \mu_2 = 0\) where \(\mu_1\) is the mean male body length for frogs in the Servotte popultion and \(\mu_2\) is the mean male body length for frogs in the Thevenon population. The alternative hypothesis is \(H_A: \mu_1 - \mu_2 \neq 0\)
The estimated standard error assuming the samples are independent and do not have the same variance is given by

\[ SE(\hat{\mu}_d) = \sqrt{\frac{7.3^2}{12}+\frac{4.1^2}{12}} = 2.417\]

The test statistic is

\[ t_{obs} = \frac{(64.7 - 61.0)}{2.417} = 1.531\]

where \(t_{obs}\) is approximately \(t\)-distributed with \(11\) degrees of freedom

The \(p\)-value is

\[ p\text{-value} = P(|t|\geq |t_{obs}||H_0) \] \[= 2\left[P(t \geq 1.531 | H_0)\right] = 0.077\]

At the \(\alpha = 0.05\) significance level, we fail to reject the null hypothesis and conclude that body lengths of four year old male frogs is not significantly different between the two populations.

References

Augert, Dominique, and Pierre Joly. 1993. “Plasticity of Age at Maturity Between Two Neighbouring Populations of the Common Frog (Rana Temporaria l.).” Canadian Journal of Zoology 71 (1): 26–33.

Ball, John P, Kjell Danell, and Peter Sunesson. 2000. “Response of a Herbivore Community to Increased Food Quality and Quantity: An Experiment with Nitrogen Fertilizer in a Boreal Forest.” Journal of Applied Ecology 37 (2): 247–55.

Martins-Filho, Paulo R, Nicole P Damascena, Renata CM Lage, and Karyna B Sposato. 2020. “Decrease in Child Abuse Notifications During COVID-19 Outbreak: A Reason for Worry or Celebration?” Journal of Paediatrics and Child Health 56 (12): 1980.

Moradi, Jalil, Alireza Bahrami, and Amir Dana. 2020. “Motivation for Participation in Sports Based on Athletes in Team and Individual Sports.” Physical Culture and Sport. Studies and Research 85 (1): 14–21.