Modeling and Summary Statistics

4. Modeling and Summary Statistics

Essentially, all models are wrong, but some are useful.

George Box, Statistician (1919-2013)

A model is an idealized representation of a system. You probably use models all the time. For instance, a weather forecast is a model. A weather forecast uses past weather, current conditions, and the physics of the atmosphere to make predictions about the future. Models don’t always match reality, as you’ve experienced if you’ve been surprised by rain or snow. And even the most complicated models of weather can’t make precise predictions more than a few weeks into the future. Still, weather forecasts are useful enough that we check the forecast before heading outside each day.

We’ve previously introduced a model called the urn model in Chapter 3. Like all models, the urn model is a simpler version of a system. It treats the underlying chance process in data generation like draws of marbles from an urn. In this chapter we introduce another kind of model called the constant model. While the urn model creates simulated data, the constant model takes a data sample and tries to describe the signal in the data by taking out the random variation in the sample. This process is called fitting a model to data. Although the constant model is simple, it serves as a useful building block towards the more complex models appearing later in the book.

For example, the model lets us explain model fitting from the perspective of loss minimization, a technique that connects summary statistics like the mean and median to more complex models. It also gives us a first look at randomness and signal in a sample, fundamental parts of modeling that we address later in Chapter %s.

We’ll begin by introducing the constant model through a dataset of bus stop wait times.