Exam Logistics:

Content To Study

This exam will cover material from Lectures 15 - 26 (Week 8 to beginning of Week 14)

Concepts and terminiology

Know/be able to define the following terms and concepts:

  • know the conditions which make the \(t\) distribution equal to the \(z\) distribution

  • The Central Limit Theorem (CLT) a fundamental concept in statistics that states that the sampling distribution of the sample mean or sample proportion approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution, as long as the sample size is sufficiently large.

  • Margin of error - is the largest distance between an estimate and a value that is not an outlier. It measures the uncertainty or precision associated with estimating a population parameter (such as a population mean or proportion) by quantifying the amount by which a sample estimate might deviate from the true population parameter.

  • Confidence level - the probability that the interval will contain the true parameter value before the data are gathered

  • Critical value - The point on the distribution of the test statistic that defines a set of values that call for rejecting the null hypothesis - i.e the value which defines the rejection region of a test.

  • Statistical significance - A statistical result that is decidedly not due to “ordinary variation” in the data (i.e., not due to chance or not a coincidence)

  • Statistical test - a type of statistical inference used to decide whether an observed result is statistically significant

  • \(p\)-value - the probability of observing a value as extreme or more extreme than the test statistic under the assumption that the null hypothesis is true

  • Null hypothesis - The hypothesis of “no effect”. It usually states that the population parameter is equal to some value

  • Alternative hypothesis - The hypothesis of “effect”. It usually states that the population parameter is in some range of value

  • Test statistic A statistic which measures the distance between the point estimate of the parameter and the hypothesized value of the parameter under the null hypothesis

  • Rejection region the interval, measured in the sampling distribution of the statistic under study, that leads to rejection of the null hypothesis \(H_0\) in a significance test.

  • Type I Error - a false positive test result: rejecting the null hypothesis when the null hypothesis is true.

  • Type II Error - a false negative test result: failing to reject the null hypothesis when the alternative hypothesis is true.

  • Statistical Power - the probability of rejecting the null hypothesis when the alternative is true.

  • Know what factors influence the power of a significance test and what can be done to increase the power of a significance test

  • Know the definitions and differences between Parametric vs Non-parametric significance tests

    • Know the advantages and disadvantages of Parametric vs Non-parametric significance tests

    • Know how a confidence interval relates to a two-tailed statistical test and how to use a confidence interval to make a decision about a significance test.

Confidence intervals

  • Understand the “anatomy” of a confidence interval (i.e., point estimate, standard score, standard error, and margin of error). Be able to identify each part of a confidence interval.

\[\text{point estimate}\pm \text{standard score} \times \text{standard error} \]

or

\[\text{point estimate}\pm \text{margin of error} \]

  • What effect does increasing the confidence level have on the margin of error and the confidence interval?

  • What effect does increasing the sample size have on the margin of error and the confidence interval?

  • Know the correct way(s) to report a confidence interval

    • E.g For a \(95\%\) Confidence Level: “At the \(95\%\) confidence level, we estimate that the population proportion (or mean) is no less than \([\)insert lower bound\(]\) but no greater than \([\)insert upper bound\(]\)
  • Understand why it is incorrect to talk about a confidence interval in a probabilistic way

  • understand that the confidence level of a confidence interval relates to the long-run frequency of CI’s that will cover the true value of the parameter



Computing a confidence interval for estimating \(p\) with \(\hat{p}\)

\[ \hat{p} \pm z\times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

  • Assumptions: random sampling, requires large \(n\), only valid when

\[ n\hat{p} \geq 15 \ \ \text{and} \ \ n(1-\hat{p}) \geq 15 \]



Computing a confidence interval for estimate \(\mu\) with \(\bar{x}\)

\[ \bar{x} \pm t\times \frac{s}{\sqrt{n}} \]

  • \(t\) is the standard score from a \(t\)-distribution with \(n-1\) degrees of freedom\

  • Assumptions: data are randomized, assumes population distribution of \(x\) is normal

Computing Sample Size

  • Know how to compute the sample size when estimating \(p\) with \(\hat{p}\)

    \[ n = \frac{p(1-p)z^2}{m^2} \]

  • Know how to compute the sample size when estimating \(\mu\) with \(\bar{x}\)

    \[ n = \frac{\sigma^2z^2}{m^2} \]

    • \(z\) is the standard score from a standard normal distribution corresponding to the desired confidence level

    • \(m\) is the desired margin of error

    • \(p\) and \(\sigma\) must determined from prior information or studies

Significance Tests

  • know the five steps to conduct a significance test:

    1. Assumptions

    2. State null and alternative hypotheses

    3. Compute test statistic

    4. Compute \(p\)-value

    5. Make a decision

  • Be sure you understand and know how to apply the decision rule in hypothesis testing

    \[\text{Decision Rule:}\begin{cases} \text{if} \ p\text{-value} \leq \alpha & \text{Reject} \ H_0 \\ \text{else} & \text{do not reject} \ H_0 \end{cases}\]

Significance for a single population proportion

  1. Assumptions: \(n\) is large; \(n\hat{p}\) and \(n(1-\hat{p}) \geq 15\)

  2. State the null and alternative hypotheses

  • Right-tailed test: \(H_0: p_0 = 0.5 \ \ H_A: p > p_0\)

  • Left-tailed test: \(H_0: p_0 = 0.5 \ \ H_A: p < p_0\)

  • Two-tailed test: \(H_0: p_0 = 0.5 \ \ H_A: p \neq p_0\)

  1. Compute the test statistic

\[ z_{\text{obs}} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}} \sim N(0,1) \]

  1. Compute the \(p\)-value
  • Right-tailed test: \(p\)-value = \(P(z \geq z_{\text{obs}} | \ H_0 \text{true})\)

  • Left-tailed test: \(p\)-value = \(P(z \leq z_{\text{obs}} | \ H_0 \text{true})\)

  • Two-tailed test: \(p\)-value = \(P(|z| \geq |z_{\text{obs}} || \ H_0 \text{true})\)

  1. Compare the \(p\)-value to \(\alpha\) and make a decision



Significance test for a single population mean

  1. Assumptions: The population distribution of the random variable \(x\) is approximately normal (this is especially important if \(n\) is small and the significance test is one sided)

  2. State the null and alternative hypotheses

  • Right-tailed test: \(H_0: \mu_0 = 0\), \(H_A: \mu > \mu_0\)

  • Left-tailed test: \(H_0: \mu_0 = 0\), \(H_A: \mu < \mu_0\)

  • Two-tailed test: \(H_0: \mu_0 = 0\), \(H_A: \mu \neq \mu_0\)

  1. Compute the test statistic

\[ t_{\text{obs}} = \frac{\bar{x}-\mu_0}{s/\sqrt{n}} \sim t(n-1) \]

  1. Compute the \(p\)-value
  • Right-tailed test: \(p\)-value = \(P(t \geq t_{\text{obs}} | \ H_0 \text{true})\)

  • Left-tailed test: \(p\)-value = \(P(t \leq t_{\text{obs}} | \ H_0 \text{true})\)

  • Two-tailed test: \(p\)-value = \(P(|t| \geq |t_{\text{obs}} || \ H_0 \text{true})\)

  1. Compare the \(p\)-value to \(\alpha\) and make a decision



The sign test

  • Understand how to form and interpret the null and alternative hypotheses for a sign test

  • be able to interpret the meaning of a positive sign (in context) and the probability of a positive sign \(P(+ \ \text{sign})\)

  • Know how to compute the test statistic for a sign test (i.e number of positive signs)

  • Know how to compute the \(p\)-value for a sign test

  • Know that the sign test is used for data with matched - pair designs

  • Know how this test relates to the binomial distribution

  • Know why we don’t count zeros in the total number of signs for a sign test

Steps of a sign test:

  1. Assumptions: data are collected from a randomization scheme. Observations are independent

  2. State the null and alternative hypotheses:

  • Two-tailed test: \(H_0: p_0 = 0.5\) \(H_A: p_0 \neq p_0\)

  • Right-tailed test: \(H_0: p_0 = 0.5\) \(H_A: p_0 > p_0\)

  • Left-tailed test: \(H_0: p_0 = 0.5\) \(H_A: p_0 < p_0\)

  • where \(p_0\) is the probability of a positive sign

  1. The test statistic \(S\) is the total number of positive signs which follows a binomial distribution \(X \sim \text{binom}(n, p_0)\) where \(n\) is the total number of signs

  2. The \(p\)-value is computed from a binomial distribution under the null:

  • for a two-tailed test \(H_A: p_0 \neq p_0\):

\[2\times P(S\geq s| H_0) = 2\times \left[ 1 - P(S<s| H_0)\right]\]

\[ = 2 \times \left[ 1 - \sum_{k = 0}^{s - 1}\frac{n!}{k!(n-k)!}p_0^k (1-p_0)^{n-k}\right] \]

  • for a right-tailed test \(H_A: p_0 > p_0\):

\[P(S\geq S| H_0) = 1 - P(S<s| H_0)\]

\[ = 1 - \sum_{k = 0}^{s - 1}\frac{n!}{k!(n-k)!}p_0^k (1-p_0)^{n-k} \]

  • for a left-tailed test \(H_A: p_0 < p_0\):

\[P(S\leq s| H_0) = \sum_{k = 0}^{s}\frac{n!}{k!(n-k)!}p_0^k (1-p_0)^{n-k} \]

  1. Compare the \(p\)-value to \(\alpha\) and make a decision



Comparing two proportions

  • Know how to conduct a statistical test concerning \(p_1 - p_2\)

    • Know how to compute a confidence interval to estimate \(p_1 - p_2\)

    • Understand that the formulas below are two independent samples only!

Inference for two proportions

\((1-\alpha)\%\) Confidence interval:

\[ (\hat{p}_1 - \hat{p}_2) \pm Z_{1-\alpha/2} \times \sqrt{\frac{\hat{p}_1(1-\hat{p})}{n_1}+\frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]

Hypothesis Testing:

  1. Assumptions: Two populations are independent; random sampling, requires large \(n\), only valid when

\[ n\hat{p} \geq 15 \ \ \text{and} \ \ n(1-\hat{p}) \geq 15 \]

where \(\hat{p}\) is the pooled estimate of the population proportion

  1. Hypotheses:
  • Two-tailed test: \(H_0: p_1-p_2 = 0\) \(H_A: p_1 - p_2 \neq 0\)

  • Right-tailed test: \(H_0: p_1 - p_2 = 0\) \(H_A: p_1 - p_2 > 0\)

  • Left-tailed test: \(H_0: p_1 - p_2 = 0\) \(H_A: p_1 - p_2 < 0\)

  1. Test statistic:

    \[ Z_{obs} = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}\sim N(0,1) \]

    \(\hat{p}\) is called the pooled estimate of the common proportion:

    \[\hat{p} = \frac{x_1+x_2}{n_1+n_2}\]

    where \(x_1\) and \(x_2\) are the number of observations in the ``success” categories for sample 1 and sample 2.

  2. \(p\)-value:

  • Two-tailed test \(H_A: p_1 - p_2 \neq 0\): \(p\)-value \(= P(|z|\geq |Z_{obs}||H_0) = 2\times\left[1 - P(z<Z_{obs}| H_0)\right]\)

  • Right-tailed test \(H_A: p_1 - p_2 > 0\): \(p\)-value \(= P(z\geq Z_{obs}| H_0) = 1 - P(z<Z_{obs}| H_0)\)

  • Left-tailed test \(H_A: p_1 - p_2 < 0\): \(p\)-value \(= P(z\leq Z_{obs}| H_0)\)



Comparing two means

  • Know how to conduct and interpret a statistical test concerning \(\mu_1 - \mu_2\)

  • Know how to compute a confidence interval to estimate \(\mu_1 - \mu_2\)

  • Understand the formulas below are for two independent samples only!

\((1-\alpha)\%\) Confidence interval:

\[ (\bar{x}_1 - \bar{x}_2) \pm t_{1-\alpha/2} \times \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} \]

  • where \(t_{1-\alpha}\) has \(\min(n_1 - 1, n_2 - 1)\) degrees of freedom

Hypothesis Testing:

  1. Assumptions: The two populations are independent, the population distribution of both samples is approximately normal (this is especially important if \(n\) is small and the significance test is one sided)

  2. Hypotheses:

  • Two-tailed test: \(H_0: \mu_1-\mu_2 = 0\) \(H_A: \mu_1 - \mu_2 \neq 0\)

  • Right-tailed test: \(H_0: \mu_1 - \mu_2 = 0\) \(H_A: \mu_1 - \mu_2 > 0\)

  • Left-tailed test: \(H_0: \mu_1 - \mu_2 = 0\) \(H_A: \mu_1 - \mu_2 < 0\)

  1. Test statistic:

\[ t_{obs} = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\sim t(\min(n_1 - 1, n_2 - 1)) \]

  1. \(p\)-value:
  • Two-tailed test \(H_A: \mu_1 - \mu_2 \neq 0\): \(p\)-value \(= P(|t|\geq |t_{obs}||H_0) = 2\times\left[1 - P(t<t_{obs}| H_0)\right]\)

  • Right-tailed test \(H_A: \mu_1 - \mu_2 > 0\): \(p\)-value \(= P(t\geq t_{obs}| H_0) = 1 - P(t<t_{obs}| H_0)\)

  • Left-tailed test \(H_A: \mu_1 - \mu_2 < 0\): \(p\)-value \(= P(t\leq t_{obs}| H_0)\)