We’ve introduced the constant model: a model that summarizes all of the data by a single value. To fit the constant model, we chose a loss function that measured how well a given constant fits a data value, and we computed the average loss over all of the data values. We saw that depending on the choice of loss function, we get a different minimizing value. We found that the mean minimizes the average squared error (MSE) and the median minimizes the average absolute error (MAE). We also discussed how we can incorporate context and knowledge of our problem to pick loss functions.
The idea of fitting models through loss minimization ties simple summary statistics—like the mean, median, and mode—to more complex modeling situations. The steps we took to model our data apply to many modeling scenarios:
Select the form of a model (such as the constant model)
Select a loss function (such as absolute error)
Fit the model by minimizing the average loss for the data
For the rest of this book, all of our modeling techniques expand upon one or more of these steps. We introduce new models (1), new loss functions (2), and new techniques for minimizing loss (3).
The next chapter revisits the study of a bus arriving late at its stop. This time, it steps back to visit all stages of the data science lifecycle as a case study.