Simulating a Randomized Trial: Vaccine Efficacy
3.3. Simulating a Randomized Trial: Vaccine Efficacy¶
In a drug trial, volunteers for the trial either receive the new treatment or a placebo (a fake treatment). In an A/B test of a new feature on a Web site, visitors to the site would either see the new feature or the usual Web page. In both examples, we control the assignment of volunteers and visitors to groups, and in a randomized controlled experiment, we use a chance process to make the assignment.
In drug trials, scientists often essentially use an urn model model to select the subjects for the treatment, and those not selected receive the placebo. With A/B testing, we often use a systematic approach, where, for example, every other visitor to the page is shown the new feature (see the exercises to learn more about systematic sampling). We can simulate the chance mechanism of the urn to better understand variation in the outcome of an experiment and the meaning of efficacy in clinical trials.
Detroit Mayor Mike Duggan made national news in March 2021 when he turned down a shipment of over 6,000 Johnson & Johnson vaccine doses stating that the citiziens of his city should “get the best”. The mayor was refering to the efficacy rate of the vaccine, which was reported to be about 66%. In comparison, Moderna and Pfizer both reported efficacy rates of about 95% for their vaccines.
On the surface, Duggan’s reasoning seems valid, but the scope of the three clinical trials are not comparable, meaning direct comparisons of the experiments is problematic [Irfan, 2021]. Moreover, the Centers for Disease Control (CDC) considers a 66% efficicay rate quite good, which is why it was given emergency approval [Centers for Disease Control, 2021].
We consider these points in turn, beginning with scope and then efficacy.
Recall that when we evaluate the scope of the data, we consider the who, when, and where of the study. For the Johnson & Johnson clinical trial, the participants:
included adults 18 and over, where roughly 40% had conditions, called comorbidities, associated with an increased risk for getting severe COVID-19;
enrolled in the study from October to November, 2020;
came from 8 countries across 3 continents, including the US and South Africa.
The participants in the Moderna and Pfizer trials were primarily from the US, roughly 40% had comorbidities for severe COVID-19, and the trial took place earlier, over summer 2021. The timing and location of the trials make them difficult to compare. Cases of COVID-19 were at a low point in the summer in the US, but they rose rapidly in the late fall. Also, a variant of the virus that is more contagious was spreading rapidly in South Africa at the time of the J&J trial.
Each clinical trial was designed to test a vaccine against the situation of no vaccine under similar circumstances through the random assignment of subjects to treatment and control groups. While the scope from one trial to the next are quite different, the randomization within a trial keeps the scope of the treatment and control groups roughly the same, which enables meaningful comparisons between groups in the same trial. The scope was different enough across the three vaccine trials to make direct comparisons problematic.
How was the trial carried out for the Johnson & Johnson vaccine? To being, 43,738 people enrolled in the trial [Janssen Biotech, Inc., 2021]. These participants were split into two groups at random. Half received the new vaccine, and the other half received a placebo, such as a saline solution. Then, everyone was followed for 28 days to see whether they contracted COVID-19.
A lot of information was recorded on each patient, such as their age, race, and sex, and in addition, whether they caught COVID, including the severity of the disease. At the end of 28 days, they found 468 cases of COVID-19, 117 of these were in the treatment group, and 351 in the control group.
The random assignment of patients to treatment and control, gives the scientists a framework to assess the effectiveness of the vaccine. The typical reasoning goes as follows:
Begin with the assumption that the vaccine is ineffective
So, the 468 who caught COVID-19 would have caught it whether or not they received the vaccine
And, the remaining 43,270 people in the trial who did not get sick would have remained healthy whether or not they received the vaccine.
The split of 117 sick people in treatment and 351 in control was solely due to the chance process in assigning participants to treatment or control.
We can set up an urn model that reflects this scenario and then study, via simulation, the behavior of the experimental results.
3.3.2. The Urn Model¶
Our urn has 43,738 marbles, one for each person in the clinical trial. Since there were 468 cases of COVID-19 among them, we label 468 marbles with a 1 and the remaining 43,270 with 0. We draw half the marbles (21,869) from the urn to receive the treatment, and the remaining half receive the placebo. The results of the experiment are simply the count of the number of marbles marked 1 that were randomly drawn from the urn.
We can simulate this process to get a sense of how likely it would be under these assumptios to draw only 117 marbles marked 1 from the urn. Since we draw half of the marbles from the urn, we would expect about half of the 468, or 234, to be drawn. The simulation study gives us a sense of the variation that might result from the random assignment process. That is, the simulation can tell us what proportion of the randomly trials would result in so few cases of the virus in the treatment group.
There are several key assumptions that enter into the urn model, such as the assumption that the vaccine is ineeffective. The random assignment of patients to treatement enables us to carry out a simulation study. It’s important to keep track of the reliance on these assumptions. Our simulation study gives us an approximation of the rarity of an outcome like the one observed only under these key assumptions.
As before, we can simulate the urm model using the hypergeometric probability distribution, rather than having to program the urn sampling from scratch.
simulations_fast = np.random.hypergeometric(ngood=468, nbad=43270, nsample=21869, size=500000)
Text(0.5, 0, 'Cases in the Treatment Group')
In our simulation, we repeated the process of random assignment to the treatment group 500,000 times. Indeed, we found not one of the 500,000 simulations had as few as 117 cases or fewer. It would be an extremely rare event to see so few cases of COVID-19, if in fact the vaccine was not effective. A more useful computation is the vaccine’s efficiacy, which we describe next.
3.3.3. Vaccine Efficacy¶
Vaccine Efficacy (VE) is measured by comparing the risk of disease among vaccinated and unvaccinated persons:
where, the risk among the unvaccinated is the proportion of unvaccinated who contracted COVID and similarly, the risk among the vaccinated is the proportion of vaccinated who contracted COVID. Since the two groups have the same number in each, we can ignore denomiators and compute the efficacy as follows.
(351 - 117) / 351
The Centers for Disease Control sets a standard of 50% for VE, when deciding whether to adopt a new vaccine. This would be equivalent to how many cases in the treatment group?
We can see from the histogram that none of the simulations yielded as few or fewer than 156 cases in the treatment group.
Furthermore, many scientists argued that we should look at the efficacy for preventing severe cases of Covid. In this scenario, the J&J vaccine was over 80% effective. Additionally, no deaths were observed in the treatment group.
After the problems with comparing drug trials that have different scopes and the efficacy for preventing severe cases of COVID-19 was explained, the Mayor of Detroit retracted his original statement, saying “I have full confidence that the Johnson & Johnson vaccine is both safe and effective.”
This example has shown that:
Using a chance process in the assignment of subjects to treatments in clinical trials can help us answer what-if scenarios;
Considering data scope can help us determine whether it is reasonable to compare figures from different datasets.
Another example of the usefulness of the urn model is in the measurement error.