The exam will is scheduled for Wednesday April 17th and it will be available to take online in Canvas through the following Sunday April 21st
You will have 2 hours to complete the exam once your start it
I will host an open zoom session for students to ask questions and reach out in case they have problems during the exam - click the link to join link
The exam will be about 20 - 25 questions long and will be open book/note
This exam will cover material from Lectures 15 - 26 (Week 8 to beginning of Week 14)
Know/be able to define the following terms and concepts:
know the conditions which make the \(t\) distribution equal to the \(z\) distribution
The Central Limit Theorem (CLT) a fundamental concept in statistics that states that the sampling distribution of the sample mean or sample proportion approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution, as long as the sample size is sufficiently large.
Margin of error - is the largest distance between an estimate and a value that is not an outlier. It measures the uncertainty or precision associated with estimating a population parameter (such as a population mean or proportion) by quantifying the amount by which a sample estimate might deviate from the true population parameter.
Confidence level - the probability that the interval will contain the true parameter value before the data are gathered
Critical value - The point on the distribution of the test statistic that defines a set of values that call for rejecting the null hypothesis - i.e the value which defines the rejection region of a test.
Statistical significance - A statistical result that is decidedly not due to “ordinary variation” in the data (i.e., not due to chance or not a coincidence)
Statistical test - a type of statistical inference used to decide whether an observed result is statistically significant
\(p\)-value - the probability of observing a value as extreme or more extreme than the test statistic under the assumption that the null hypothesis is true
Null hypothesis - The hypothesis of “no effect”. It usually states that the population parameter is equal to some value
Alternative hypothesis - The hypothesis of “effect”. It usually states that the population parameter is in some range of value
Test statistic A statistic which measures the distance between the point estimate of the parameter and the hypothesized value of the parameter under the null hypothesis
Rejection region the interval, measured in the sampling distribution of the statistic under study, that leads to rejection of the null hypothesis \(H_0\) in a significance test.
Type I Error - a false positive test result: rejecting the null hypothesis when the null hypothesis is true.
Type II Error - a false negative test result: failing to reject the null hypothesis when the alternative hypothesis is true.
Statistical Power - the probability of rejecting the null hypothesis when the alternative is true.
Know what factors influence the power of a significance test and what can be done to increase the power of a significance test
Know the definitions and differences between Parametric vs Non-parametric significance tests
Know the advantages and disadvantages of Parametric vs Non-parametric significance tests
Know how a confidence interval relates to a two-tailed statistical test and how to use a confidence interval to make a decision about a significance test.
\[\text{point estimate}\pm \text{standard score} \times \text{standard error} \]
or
\[\text{point estimate}\pm \text{margin of error} \]
What effect does increasing the confidence level have on the margin of error and the confidence interval?
What effect does increasing the sample size have on the margin of error and the confidence interval?
Know the correct way(s) to report a confidence interval
Understand why it is incorrect to talk about a confidence interval in a probabilistic way
understand that the confidence level of a confidence interval relates to the long-run frequency of CI’s that will cover the true value of the parameter
\[ \hat{p} \pm z\times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]
\[ n\hat{p} \geq 15 \ \ \text{and} \ \ n(1-\hat{p}) \geq 15 \]
\[ \bar{x} \pm t\times \frac{s}{\sqrt{n}} \]
\(t\) is the standard score from a \(t\)-distribution with \(n-1\) degrees of freedom\
Assumptions: data are randomized, assumes population distribution of \(x\) is normal
Know how to compute the sample size when estimating \(p\) with \(\hat{p}\)
\[ n = \frac{p(1-p)z^2}{m^2} \]
Know how to compute the sample size when estimating \(\mu\) with \(\bar{x}\)
\[ n = \frac{\sigma^2z^2}{m^2} \]
\(z\) is the standard score from a standard normal distribution corresponding to the desired confidence level
\(m\) is the desired margin of error
\(p\) and \(\sigma\) must determined from prior information or studies
know the five steps to conduct a significance test:
Assumptions
State null and alternative hypotheses
Compute test statistic
Compute \(p\)-value
Make a decision
Be sure you understand and know how to apply the decision rule in hypothesis testing
\[\text{Decision Rule:}\begin{cases}
\text{if} \ p\text{-value} \leq \alpha & \text{Reject} \ H_0 \\
\text{else} & \text{do not reject} \ H_0
\end{cases}\]
Assumptions: \(n\) is large; \(n\hat{p}\) and \(n(1-\hat{p}) \geq 15\)
State the null and alternative hypotheses
Right-tailed test: \(H_0: p_0 = 0.5 \ \ H_A: p > p_0\)
Left-tailed test: \(H_0: p_0 = 0.5 \ \ H_A: p < p_0\)
Two-tailed test: \(H_0: p_0 = 0.5 \ \ H_A: p \neq p_0\)
\[ z_{\text{obs}} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}} \sim N(0,1) \]
Right-tailed test: \(p\)-value = \(P(z \geq z_{\text{obs}} | \ H_0 \text{true})\)
Left-tailed test: \(p\)-value = \(P(z \leq z_{\text{obs}} | \ H_0 \text{true})\)
Two-tailed test: \(p\)-value = \(P(|z| \geq |z_{\text{obs}} || \ H_0 \text{true})\)
Assumptions: The population distribution of the random variable \(x\) is approximately normal (this is especially important if \(n\) is small and the significance test is one sided)
State the null and alternative hypotheses
Right-tailed test: \(H_0: \mu_0 = 0\), \(H_A: \mu > \mu_0\)
Left-tailed test: \(H_0: \mu_0 = 0\), \(H_A: \mu < \mu_0\)
Two-tailed test: \(H_0: \mu_0 = 0\), \(H_A: \mu \neq \mu_0\)
\[ t_{\text{obs}} = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} \sim t(n-1) \]
Right-tailed test: \(p\)-value = \(P(t \geq t_{\text{obs}} | \ H_0 \text{true})\)
Left-tailed test: \(p\)-value = \(P(t \leq t_{\text{obs}} | \ H_0 \text{true})\)
Two-tailed test: \(p\)-value = \(P(|t| \geq |t_{\text{obs}} || \ H_0 \text{true})\)
Understand how to form and interpret the null and alternative hypotheses for a sign test
be able to interpret the meaning of a positive sign (in context) and the probability of a positive sign \(P(+ \ \text{sign})\)
Know how to compute the test statistic for a sign test (i.e number of positive signs)
Know how to compute the \(p\)-value for a sign test
Know that the sign test is used for data with matched - pair designs
Know how this test relates to the binomial distribution
Know why we don’t count zeros in the total number of signs for a sign test
Steps of a sign test:
Assumptions: data are collected from a randomization scheme. Observations are independent
State the null and alternative hypotheses:
Two-tailed test: \(H_0: p_0 = 0.5\) \(H_A: p_0 \neq p_0\)
Right-tailed test: \(H_0: p_0 = 0.5\) \(H_A: p_0 > p_0\)
Left-tailed test: \(H_0: p_0 = 0.5\) \(H_A: p_0 < p_0\)
where \(p_0\) is the probability of a positive sign
The test statistic \(S\) is the total number of positive signs which follows a binomial distribution \(X \sim \text{binom}(n, p_0)\) where \(n\) is the total number of signs
The \(p\)-value is computed from a binomial distribution under the null:
\[2\times P(S\geq s| H_0) = 2\times \left[ 1 - P(S<s| H_0)\right]\]
\[ = 2 \times \left[ 1 - \sum_{k = 0}^{s - 1}\frac{n!}{k!(n-k)!}p_0^k (1-p_0)^{n-k}\right] \]
\[P(S\geq S| H_0) = 1 - P(S<s| H_0)\]
\[ = 1 - \sum_{k = 0}^{s - 1}\frac{n!}{k!(n-k)!}p_0^k (1-p_0)^{n-k} \]
\[P(S\leq s| H_0) = \sum_{k = 0}^{s}\frac{n!}{k!(n-k)!}p_0^k (1-p_0)^{n-k} \]
Know how to conduct a statistical test concerning \(p_1 - p_2\)
Know how to compute a confidence interval to estimate \(p_1 - p_2\)
Understand that the formulas below are two independent samples only!
\((1-\alpha)\%\) Confidence interval:
\[ (\hat{p}_1 - \hat{p}_2) \pm Z_{1-\alpha/2} \times \sqrt{\frac{\hat{p}_1(1-\hat{p})}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]
Hypothesis Testing:
\[ n\hat{p} \geq 15 \ \ \text{and} \ \ n(1-\hat{p}) \geq 15 \]
where \(\hat{p}\) is the pooled estimate of the population proportion
Two-tailed test: \(H_0: p_1-p_2 = 0\) \(H_A: p_1 - p_2 \neq 0\)
Right-tailed test: \(H_0: p_1 - p_2 = 0\) \(H_A: p_1 - p_2 > 0\)
Left-tailed test: \(H_0: p_1 - p_2 = 0\) \(H_A: p_1 - p_2 < 0\)
Test statistic:
\[ Z_{obs} = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\sim N(0,1) \]
\(\hat{p}\) is called the pooled estimate of the common proportion:
\[\hat{p} = \frac{x_1+x_2}{n_1+n_2}\]
where \(x_1\) and \(x_2\) are the number of observations in the ``success” categories for sample 1 and sample 2.
\(p\)-value:
Two-tailed test \(H_A: p_1 - p_2 \neq 0\): \(p\)-value \(= P(|z|\geq |Z_{obs}||H_0) = 2\times\left[1 - P(z<Z_{obs}| H_0)\right]\)
Right-tailed test \(H_A: p_1 - p_2 > 0\): \(p\)-value \(= P(z\geq Z_{obs}| H_0) = 1 - P(z<Z_{obs}| H_0)\)
Left-tailed test \(H_A: p_1 - p_2 < 0\): \(p\)-value \(= P(z\leq Z_{obs}| H_0)\)
Know how to conduct and interpret a statistical test concerning \(\mu_1 - \mu_2\)
Know how to compute a confidence interval to estimate \(\mu_1 - \mu_2\)
Understand the formulas below are for two independent samples only!
\((1-\alpha)\%\) Confidence interval:
\[ (\bar{x}_1 - \bar{x}_2) \pm t_{1-\alpha/2} \times \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} \]
Hypothesis Testing:
Assumptions: The two populations are independent, the population distribution of both samples is approximately normal (this is especially important if \(n\) is small and the significance test is one sided)
Hypotheses:
Two-tailed test: \(H_0: \mu_1-\mu_2 = 0\) \(H_A: \mu_1 - \mu_2 \neq 0\)
Right-tailed test: \(H_0: \mu_1 - \mu_2 = 0\) \(H_A: \mu_1 - \mu_2 > 0\)
Left-tailed test: \(H_0: \mu_1 - \mu_2 = 0\) \(H_A: \mu_1 - \mu_2 < 0\)
\[ t_{obs} = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\sim t(\min(n_1 - 1, n_2 - 1)) \]
Two-tailed test \(H_A: \mu_1 - \mu_2 \neq 0\): \(p\)-value \(= P(|t|\geq |t_{obs}||H_0) = 2\times\left[1 - P(t<t_{obs}| H_0)\right]\)
Right-tailed test \(H_A: \mu_1 - \mu_2 > 0\): \(p\)-value \(= P(t\geq t_{obs}| H_0) = 1 - P(t<t_{obs}| H_0)\)
Left-tailed test \(H_A: \mu_1 - \mu_2 < 0\): \(p\)-value \(= P(t\leq t_{obs}| H_0)\)