Basics of Confidence Intervals

16.4. Basics of Confidence Intervals

We have seen that modeling leads to estimates, such as the typical time that a bus is late (Chapter 5), an humidity adjustment to an air quality measurement (Chapter 15), or an estimate of vaccine efficacy (Chapter 2). These examples are point estimates for unknown values, called parameters: the median lateness of the bus is 0.74 minutes; the humidity adjustment to air quality is 0.21 PM2.5 per humidity percentage point; and the ratio of COVID infection rates in vaccine efficacy is 0.67. However, a different sample would have produced a different estimate. Simply providing a point estimate doesn’t give a sense of the estimate’s precision. Alternatively, an interval estimate can reflect the estimate’s accuracy. These intervals typically take one of two forms:

  1. A bootstrap confidence interval created from the percentiles of the bootstrap sampling distribution;

  2. A normal confidence interval constructed using the standard error (SE) of the sampling distribution and additional assumptions about the distribution having the shape of a normal curve.

We describe these two types of intervals and then give an example.

Recall the sampling distribution (see {numrefFigure %s <fig:triptych>) is a probability distribution that reflects the chance of observing different values of \(\hat{\theta}\). Confidence intervals are constructed from the spread of the sampling distribution of \(\hat{\theta}\) so the endpoints of the interval are random because they are based on \(\hat{\theta}\). These intervals are designed so that 95% of the time the interval covers \(\theta^*\).

As its name suggests, the percentile-based bootstrap confidence interval is created from the percentiles of the bootstrap sampling distribution. Specifically, we compute the quantiles of the sampling distribution of \(\hat{\theta}_B\), where \(\hat{\theta}_B\) is the bootstrapped statistic. For a 95% percentile interval, we identify the 2.5 and 97.5 quantiles, called \(q_{2.5,B}\) and \(q_{97.5,B}\), respectively, where 95% of the time the bootstrapped statistic is in the interval:

\[ q_{2.5,B} \leq \hat{\theta}_B~ \leq ~ q_{97.5,B}.\]

This bootstrap percentile confidence interval is considered a quick and dirty interval. There are many alternatives that adjust for bias, take into consideration the shape of the distribution, and are better suited for small samples. See Hesterberg for examples.

The percentile confidence interval does not rely on the sampling distribution having a particular shape or the center of the distribution being \(\theta^*\). In contrast, the normal confidence interval often doesn’t require bootstrapping to compute, but it does make additional assumptions about the shape of the sampling distribution of \(\hat{\theta}\).

We use the normal confidence interval when the sampling distribution is well-approximated by a normal curve. For a normal probability distribution, with center \(\mu\) and spread \(\sigma\), there is a 95% chance that a random value from this distribution is in the interval \(\mu ~\pm ~ 1.96 \sigma\). Since the center of the sampling distribution is typically \(\theta^*\), the chance is 95% that for a randomly generated \(\hat{\theta}\):

\[|\hat{\theta} -\theta^*| \leq 1.96 SD(\hat{\theta}),\]

where \(SD(\hat{\theta})\) is the spread of the sampling distribution of \(\hat{\theta}\). We use this inequality to make a 95% confidence interval for \(\theta^*\):

\[ [ \hat{\theta} ~-~ 1.96 SD(\hat{\theta}),~~~ \hat{\theta} ~ +~ 1.96 SD(\hat{\theta})]\]

Other size confidence intervals can be formed with different multiples of \(SD(\hat{\theta})\), all based on the normal curve. For example, a 99% confidence interval is \(\pm 2.58 SE\), and a one-sided upper 95% confidence interval is \([ \hat{\theta} ~-~ 1.64 SE(\hat{\theta}),~~ \infty]\).

Note

Confidence intervals can be easily misinterpreted as the chance that the parameter \(\theta^*\) is in the interval. However, the confidence interval is created from one realization of the sampling distribution. The sampling distribution gives us a different probability statement, 95% of the time, an interval constructed in this way will contain \(\theta^*\). Unfortunately, we don’t know whether this particular time is one of those that happens 95 times in 100, or not. That is why, the term “confidence” is used rather than “probability” or “chance”, and we say that we are 95% confident that the parameter is in our interval.

Note

The SD of a parameter estimate is often called the standard error, or SE, to distinguish it from the SD of a sample, population, or one draw from an urn. In this book, we don’t differentiate between them. We call them SDs.

We provide an example of each type of interval next.

16.4.1. Confidence intervals for a coefficient

Earlier in this chapter we tested the hypothesis that the coefficient for humidity in a linear model for air quality is 0. The fitted coefficient given the data was \(-0.086\). Since the null model did not completely specify the data generation mechanism, we resorted to bootstrapping. That is, we used the data as the population, took a sample of 11,226 records with replacement from the bootstrap population, and fitted the model to find the bootstrap sample coefficient for humidity. Our simulation repeated this process 10,000 times, to get an approximate bootstrap sampling distribution, which we display again below.

Text(0.5, 0, 'Humidity Coefficient $\\hat{\\theta}_B$')
../../_images/inf_pred_gen_CI_11_1.svg

We can use the percentiles of this bootstrap sampling distribution to create a 99% confidence interval for \(\theta^*\). To do this, we find the quantiles, \(q_{0.5}\) and \(q_{99.5}\), of the bootstrap sampling distribution.

q_995 = np.percentile(boot_theta_hat, 99.5, interpolation='lower')
q_005 = np.percentile(boot_theta_hat, 0.05, interpolation='lower')

print(f"Lower 0.05th percentile: {q_005:.3f} \nUpper 99.5th percentile: {q_995:.3f}")
Lower 0.05th percentile: 0.100 
Upper 99.5th percentile: 0.259

Alternatively, since the histogram of the sampling distribution looks roughly normal in shape, we can create a 99% confidence interval based on the normal distribution. First, we find the \(SD(\hat{\theta})\), which is just the standard deviation of the sampling distribution of \(\hat{\theta}\),

SD = np.std(boot_theta_hat)
SD
0.026528863870312752

Then, a 99% confidence interval for \(\theta^*\) is \(2.58\) SDs away from the observed \(\hat{\theta}\) in either direction:

print(f"Lower 0.05th endpoint: {theta2_hat - (2.58 * SD):.3f} \nUpper 99.5th endpoint: {theta2_hat + (2.58 * SD):.3f}")
Lower 0.05th endpoint: 0.138 
Upper 99.5th endpoint: 0.275

These two intervals (bootstrap percentile and normal) are close, but clearly not identical. We might expect this given the slight asymmetry in the bootstrapped sampling distribution.

There are other versions of the normal-based confidence interval that reflect the inaccuracy in estimating the standard error of the sampling distribution. And still other confidence intervals for statistics that are percentiles, rather than averages. If you are interested in learning more, see XXX. (Also note that for permutation tests, the bootstrap tends not to be as accurate as normal approximations.)

Confidence intervals and hypothesis tests are related in the following way. If, say a 95% confidence interval contains the hypothesized value \(\theta^*\), then the \(p\)-value for the test would be less than 5%. That is, we can invert a confidence interval to create a hypothesis test. We used this technique in the previous section when we carried out the test that the coefficient for humidity in the air quality model is 0. In this section, we have created a 99% confidence interval for the coefficient (based on the bootstrap percentiles) and since 0 does not belong to the interval, the \(p\)-value is less than 1% and statistical logic would lead us to conclude that the coefficient is not 0.

Another kind of interval estimate is the prediction interval. Prediction intervals focus on the variation in observations, rather than the variation in an estimator. We explore these next.