# 2.1. Dewey Defeats Truman¶

In the 1948 US Presidential election, New York Governor Thomas Dewey ran against the incumbent Harry Truman. As usual, a number of polling agencies conducted polls of voters in order to predict which candidate was more likely to win the election.

## 2.1.1. 1936: A Previous Polling Catastrophe¶

In 1936, three elections prior to 1948, the *Literary Digest* infamously
predicted a landslide defeat for Franklin Delano Roosevelt. To make this
claim, the magazine polled a sample of over 2 million people based on telephone
and car registrations. As you may know, this sampling scheme suffers from
sampling bias: those with telephones and cars tend to be wealthier than those
without. In this case, the sampling bias was so great that the *Literary
Digest* thought Roosevelt would only receive 43% of the popular vote when he
ended up with 61% of the popular vote, a difference of almost 20% and the
largest error ever made by a major poll. The *Literary Digest* went out of
business soon after.

## 2.1.2. 1948: The Gallup Poll¶

Determined to learn from past mistakes, the Gallup Poll used a method called
*quota sampling* to predict the results of the 1948 election. In their sampling
scheme, each interviewer polled a set number of people from each demographic
class. For example, the interviews were required to interview both males and
females from different ages, ethnicities, and income levels to match the
demographics in the US Census. This ensured that the poll would not leave out
important subgroups of the voting population.

Using this method, the Gallup Poll predicted that Thomas Dewey would earn 5%
more of the popular vote than Harry Truman would. This difference was
significant enough that the *Chicago Tribune* famously printed the headline
“Dewey Defeats Truman”:

As we know now, Truman ended up winning the election. In fact, he won with 5% more of the popular vote than Dewey! What went wrong with the Gallup Poll?

## 2.1.3. The Problem With Quota Sampling¶

Although quota sampling did help pollsters reduce sampling bias, it introduced bias in another way. The Gallup Poll told its interviewers that as long as they fulfilled their quotas they could interview whomever they wished. Here’s one possible explanation for why the interviewers ended up polling a disproportionate number of Republicans: at the time, Republicans were on average wealthier and more likely to live in nicer neighborhoods, making them easier to interview. This observation is supported by the fact that the Gallup Poll predicted 2-6% more Republican votes than the actual results for the 3 elections prior.

These examples highlight the importance of understanding sampling bias as much
as possible during the data collection process. Both *Literary Digest* and
Gallup Poll made the mistake of assuming their methods were unbiased when
their sampling schemes were based on human judgement all along.

We now rely on **probability sampling**, a family of sampling methods that
assigns precise probabilities to the appearance of each sample, to reduce bias
as much as possible in our data collection process.

## 2.1.4. Big Data?¶

In the age of Big Data, we are tempted to deal with bias by collecting more data. After all, we know that a census will give us perfect estimates; shouldn’t a very large sample give almost perfect estimates regardless of the sampling technique?

We will return to this question after discussing probability sampling methods to compare the two approaches.