Simulation and Data Design

3. Simulation and Data Design

In this chapter, we develop the theory behind the chance processes introduced in {numref}‘Chapter %s ch:data_scope’. This theory makes the concepts of bias and variation more precise. We continue to motivate the accuracy of our data through the abstraction of an urn model that was first introduced in Chapter 2, and we use simulation studies to and helps us understand and make decisions based on the data.

We begin with an artifical example of a small population; it’s so small that we can list all the possible samples that can be drawn from the population. Then, we consider simple variations on drawing marbles from the urn to extend the urn model to more complex sampling designs those used in complex surveys.

Next, we use the urn model as a technical framework to design and run simulation studies to understand larger and more complex situations. We return to some of the examples from Chapter 2 and, for example, dive deeper into understanding how the pollsters might have gotten the 2016 Presidential Election predictions wrong (Section 3.2). We use the actual votes cast in Pennsylvania to simulate the sampling variation for a poll of 1,400 from six million voters. This simulation helps us uncover how response bias can skew polls, and convince us that collecting a lot more data would not have helped the situtation (another example of big data hubris).

In a second simulation study, we examine the efficacy of a COVID-19 vaccine. A designed experiment for the vaccine was carried out on over 50,000 volunteers. Abstracting the experiment to an urn model gives us a tool for studying assignment variation in randomized controlled experiments. Through simulation, we find the expected outcome of the clinical trial. Our simulation, along with careful examination of the data scope, debunks claims of vaccine ineffectiveness.

In addition, to sampling variation and assignment variation, we also cast measurement error in terms of an urm model. We use multiple measurements from different times of the day to estimate the accuracy of an air quality sensor. Later in Chapter 12, we provide a more comprehensive treatment of measurement error and instrument calibration for air quality sensors.