Modeling with Summary Statistics

4. Modeling with Summary Statistics

A model is an idealized representation of a system. You probably use models all the time. For instance, a weather forecast is a model. A weather forecast uses past weather, current conditions, and the physics of the atmosphere to make predictions about the future. Models don’t always match reality, as you’ve experienced if you’ve been surprised by rain or snow. And even the most complicated models of weather can’t make precise predictions more than a few weeks into the future. Still, weather forecasts are useful enough that we check the forecast before heading outside each day.

We’ve previously introduced a model called the urn model in Chapter 3. The urn model likens the underlying chance process in data generation like draws of marbles from an urn, and we use it to study variation. In this chapter, we introduce another kind of model that describes the pattern/signal in the data rather than the random variation. This process is called fitting a model to data. We focus on the simplest of these sorts of models, called the constant model. It serves as a useful building block towards the more complex models appearing later in the book.

The constant model lets us introduce model fitting from the perspective of loss minimization, which connects summary statistics like the mean and median to more complex models. We begin with an example that uses data about the wait times for a bus to introduce the constant model.