Lecture 19 Monday, March 18th 2024

Review of confidence intervals for \(p\):

Recall that confidence intervals are used to estimate the value of an unknown parameter with a interval of values. Confidence intervals serve two purposes: (1) they provide a range of possible values of the parameter and (2) they provide an indication of how accurate the estimate is i.e how confident we are in the results. A confidence interval has the general form

\[ \text{point estimate} \ \pm \ \text{margin of error} \]

The confidence portion comes from the confidence level which states the probability that the interval will give the correct result based on the long-run frequency of the method. That is, for a confidence level of \(95\%\), for example, we can expect approximately \(95\%\) of confidence intervals to capture the value of the unknown parameter. However, for any given confidence interval, once it is computed the value of the parameter is either in the interval or not in the interval - but we do not know.

Before the break, we discussed three confidence intervals: 2 for a population mean and one for a population proportion. We will review the latter first. The \((1 - \alpha)\%\) confidence interval for estimating \(p\) with \(\hat{p}\) is given by

\[ \hat{p} \ \pm \ z_{1 - \alpha/2} \ \sqrt{\frac{\hat{p}(1- \hat{p})}{n}}\]

where \(\hat{p}\) denotes the sample proportion and

\[ m = z_{1 - \alpha/2} \sqrt{\frac{\hat{p}(1- \hat{p})}{n}}\]

denotes the margin of error.

  • Practice: Investors in the stock market are often interested in understanding the true proportion of stocks that go up and down each week. Suppose a random sample of \(150\) stocks is taken, and it is found that \(82\) of them have increased in value during the past week.

    1. compute the margin of error \(m\)

    2. Give the \(95\%\) confidence interval for the true proportion of stocks \(p\)

    3. If you instead wanted the \(99\%\) confidence interval for the same sample, would the margin of error be greater than, less than, or equal to the margin of error you computed in part (a)?

Review of confidence intervals for \(\mu\):

For estimating \(\mu\) with \(\bar{x}\), we first proposed the following confidence interval

\[ \bar{x} \ \pm \ z_{1 - \alpha/2} \ \frac{\sigma}{\sqrt{n}}\]

where \(\bar{x}\) denotes the sample mean and

\[ m = z_{1 - \alpha/2} \ \frac{\sigma}{\sqrt{n}}\]

denotes the margin of error. Recall that this interval requires that we know the value of \(\sigma\) - the population standard deviation. If we do not the value of \(sigma\), we could instead sub in the sample standard deviation \(s\) in its place to get

\[ \bar{x} \ \pm \ z_{1 - \alpha/2} \ \frac{s}{\sqrt{n}}\]

However, as we discussed before the break, the confidence interval above, which is based on the standard normal distribution, will tend have a lower than expected confidence level when \(n\) is small, say \(n \leq 30\). Instead, a better confidence interval that is based on the \(t\)-distribution is given by

\[ \bar{x} \ \pm \ t_{n-1, 1 - \alpha/2} \ \frac{s}{\sqrt{n}}\]

where \(t_{n-1, 1 - \alpha/2}\) is the \(1-\alpha/2\) percentile of a \(t\) distribution with \(n-1\) degrees of freedom. The above confidence interval will give correct results regardless of sample size. Moreover, since the \(t\)-distribution converges to the standard normal distribution for as \(n\) gets large, the two methods tend to give identical results for \(n < 30\).

  • Practice Consider the following data table which gives the mean annual temperate in central park over 10 years of recordings starting in 2009.
Year Mean Annual Temp
2009 54.0
2010 56.7
2011 56.4
2012 57.3
2013 55.3
2014 54.4
2015 56.7
2016 57.2
2017 56.3
2018 55.9
  • Verify that the average mean annual temp is \(\bar{x} = 56.02\) with standard deviation \(s \approx 1.13\)

  • Given the \(95\%\) confidence interval for mean annual temperature

Factors that influence the margin of error:

When we compute a confidence interval, we would like the interval to be as small as possible. Why? simply put, because we want the most precise estimate of the parameter that we can get. A small interval is the result of a small margin of error. However, several factors affect the width of the margin of error of an estimate. The margin of error of a confidence interval decreases as:

  1. The confidence level decreases

  2. The population standard deviation \(\sigma\) decreases

  3. The sample size \(n\) increases

See this link here to check these properties yourself!

First, we typically do not want to decrease the confidence level. By convention, most confidence intervals reported in academic journals are at least \(90\%\) or greater. This is because we want to retain a high degree of confidence that our estimate is accurate. Secondly, we have no control over the value of the population standard deviation. \(\sigma\) is a property of the population. If \(\sigma\) is inherently large for the population under study, then we are out of luck in that regard.

However, we are fortunate in that we can control the sample size of our study. Recall that the standard deviation for the sampling distributions of \(\bar{x}\) and \(\hat{p}\) both involve division by the number of observations in the sample \(n\). Therefore, we can reduce the margin of error of our estimate of \(\mu\) or \(p\) by increasing \(n\) - the number of observations in the sample.

Choosing a minimum sample size

While it is nice that we can simply choose to observe a large number of observations from the population to get a small margin of error, real studies and surveys cost time, money, and effort on behalf of the researchers. Therefore, we want to avoid taking an arbitrarily large number of observations. Instead, we would like to observe only as many observations as needed to achieve a certain precision in our estimate.

Luckily, with a little bit of algebra we can rearrange the formula for the margin of error to determine the sample size we required to achieve a desired level of precision.

How do we choose \(n\) for estimate \(p\) with \(\hat{p}\)? Consider that

\[ m = z_{1-\alpha/2} \ \sqrt{\frac{p(1-p)}{n}} \Leftrightarrow n = \frac{(z_{1-\alpha/2})^2 p(1-p)}{m^2}\]

where \(m\) denotes the margin of error, \(p\) is the population proportion and \(z\) is the standard score corresponding the desired \((1-\alpha/2)%\) confidence level. Keep in mind that the formula above is used to determine the sample size needed when we designing a study to investigate the value of an unknown population proportion. Therefore, how do we choose the value \(p\)? We have essentially two options

  1. We can look for related studies or prior research to inform the value of \(p\)

  2. We can use \(p = 0.5\) which gives the upper bound on the sample size:

\[ n = \frac{z^2 p(1-p)}{m} \leq \frac{z^2 0.5(1-0.5)}{m} \]

The latter option is a conservative approach we can use when we have no prior information regarding the value of the population proportion.

  • Try it out: The sensitivity of a medical test is the probability of a positive result when the condition (e.g., disease) is present. Rapid strep tests have a sensitivity of about \(0.95\). A new rapid strep test has been developed but its sensitivity is unknown. How many tests would be necessary to estimate its sensitivity with a margin of error of \(0.02\) at a confidence level of \(95\%\)?

How do we choose \(n\) for estimate \(\mu\) with \(\bar{x}\)? Consider that

\[ m = z_{1-\alpha/2} \ \frac{\sigma}{\sqrt{n}} \Leftrightarrow n = \frac{(z_{1-\alpha/2})^2 \sigma^2}{m^2}\]

Notice that the above equation for \(n\) requires that we known \(\sigma\). So how can we choose the value of \(\sigma\)?. Again we have two options:

  1. Use prior research/knowledge to inform the value of \(\sigma\)

  2. We can use an educated guess about the possible range of \(\mu\) and estimate \(\sigma\) via

\[ \sigma \approx \frac{\text{range}}{4} \]

  • Try it out: A researcher wants to estimate the mean number of hours college students spend scrolling TikTok each week. Previous studies suggest that the standard deviation of the weekly hours spent on TikTok is approximately 2.5 hours. The researcher desires a margin of error of no more than 0.5 hours. Compute the sample size the researcher needs to estimate the mean number of hours within the desired margin of error.

  • Try it out: A marketing research firm wants to estimate the average amount a student spends during the Spring break. They want to determine it to within \(\$120\) with \(90\%\) confidence. One can roughly say that amount a students spends ranges from \(\$100\) to \(\$1700\). About how many students should they sample?

Assumptions of estimators

When we compute the confidence intervals that we have learned above we make a few assumptions about the data and the estimator:

  • We assume the data constitute a random sample

  • We assume that the statistic is unbiased

  • We assume an infinite population size. Note that under small, finite populations the formulas for the confidence intervals will be different.

  • That we can compute (approximately) the correct standard error

  • The shape of the sampling distribution is approximately normal.

To have a sampling distribution that is (approximately) normal in shape, we need at least one of the following two conditions:

  1. The population distribution of the variable needs to be (approximately) normal in shape

  2. The sample size (n) is sufficiently large.

The sampling distribution of \(\hat{p}\)

  • The shape of the population distribution is never normal

  • The sampling distribution is “sufficiently large” if both \(np \geq 15\) and \(n(1-p) \geq 15\).

The sampling distribution of \(\hat{p}\)

  • The shape of the population distribution might be normal - you can check this by plotting the data

  • The sampling distribution will be approximately normal if \(n\) is large enough

Lecture 20 Monday, March 20th 2024

Introduction to tests of significance

Confidence intervals are an appropriate tool when the goal is to estimate something about a population. The second part of statistical inference deals with a very different sort of goal. Significance tests (also called hypothesis tests) combine mathematics and logic to assess the validity of a claim about a population parameter using the evidence provided in a set of data. Specifically, we compare properties of a set of observed data to a hypothesis about the population. The hypothesis is a statement that asserts something about the value of the population parameters. The results of the test are expressed in terms of probability that measures the degree to which the data agree with claim laid out by the hypothesis.

Consider the following example: Imagine you are a quality control manager at a electronics company that produces microchips for personal computers. To ensure quality, the company tests all chips before they are sent out to retailers. The company claims that, thanks to their post-production testing, only about \(2\%\) of their microchips are defective, but recent reports about customer feedback indicate that the chips might have an issue that causes the defective rate to be higher than \(2\%\). To address these concerns, you decide to conduct a study to test the claim that only \(2\%\) of chips are defective. To test this, the company purchases \(10\) computers from a random sample of \(5\) retailers that it supplies microchips. Each computer is run through the same set of high intensity tasks to test the chips and the proportion of chips that fail is estimated. From this experiment, you estimate the proportion of defective chips to be about \(2.053\%\) with a standard error of \(0.019\%\) which gives a difference in rates of \(0.00053\%\).

  • From the estimate, can you conclude that the proportion of defective chips is greater than the expected rate of \(2\%\)?

One way to answer this question is to compute the probability of obtaining a difference as large or larger than the one you have observed. This probability is \(0.48\). Since this probability is quite large, we conclude that observing a difference of \(0.00053\) is not surprising if the defective rate of the sample is the same as the population rate of \(2\%\). In this case, the sample supports the initial claim that the defective rate is about \(2\%\).

A statistically significant result is one that is decidedly not due to “ordinary variation” in the data (i.e., not due to chance or not a coincidence). Significance tests are how we decide whether or not an observed result is statistically significant.

Consider a coin that is flipped \(n\) times. The population distribution for a single flip is given below:

X Probability(X)
Heads \(p\)
Tails \(p(1-p)\)

The value of \(p\) implies something about the coin:

  1. If \(p = 0.5\) the coin is fair

  2. If \(p \neq 0.5\) the coin is unfair.

Assume we do not know the value of \(p\). We flip the coin \(30\) times to produce a sample of \(n = 30\) observations. It comes up heads \(20\) times, so \(\hat{p} = 20/30 = 2/3 \approx 0.67\) What might we decide about the value of \(p\)?

  1. Conclude that \(p =0.5\). The result that \(\hat{p} = 2/3\) is not statistically significant.

  2. Conclude that \(p \neq 0.5\). The result that \(\hat{p} = 2/3\) is statistically significant.

The probability of observing a proportion of heads of \(\hat{p} = 2/3\) assuming \(p = 0.5\) is about \(0.067\). What might we conclude about the coin?

In this example, the probability we obtained is fairly small. There are two explanations for this probability:

  1. We have observed something unusual (i.e the coin is fair we just got a large proportion of heads by coincidence)

  2. The assumption that underlies the calculation is not true. (i.e the coin is not fair and the population proportion \(p\) is not equal to \(1/2\))

So how unusual does a result need to be for us to consider the latter statement (ii) to be more plausible? We will discuss this more later on, but, in general, this is an arbitrary choice that we must make when setting up our test.

In both of these examples, we stated hypotheses about the value of the parameter and then proposed a counter hypothesis about that value. A significance test is a comparison between two such hypotheses about the the value of a parameter: a null hypothesis and and alternative hypothesis.

The null hypothesis (denote \(H_0\)) is usually the hypothesis of “no effect” or that “nothing interesting is happening”. The null hypothesis is usually that the population parameter equals some value \(H_0: \theta = \theta_0\) for some constant \(\theta_0\)

The alternative hypothesis (denoted \(H_A\)) is the hypothesis of “effect” or that “something interesting has happened”. The alternative hypothesis is usually that the population parameter falls in some range of values such as:

The value of the parameter is greater than the value proposed in the null hypothesis \[H_A: \theta > \theta_0 \] The value of the parameter is less than the value proposed in the null hypothesis \[H_A: \theta < \theta_0 \] The value of the parameter is different than the value proposed in the null hypothesis (could be greater than or less than - we aren’t sure, we just suspect its different) \[H_A: \theta \neq \theta_0 \]

  • the alternative hypothesis expresses our “suspicions” that we have about the data. For example, we suspect the defect rate is higher than \(2\%\) or we suspect the coin is not fair .