12.5. In Conclusion

In this chapter, we replicated Barkjohn’s analysis. We created a model that corrects PurpleAir measurements so that they closely match AQS measurements. The accuracy of this model enables the PurpleAir sensors to be included on official US government maps, like the AirNow Fire and Smoke map. Importantly, this model gives people timely and accurate measurements of air quality.

We also applied many concepts covered in the book thus far. We used pandas code extensively throughout this analysis (and even a bit of SQL too). Data wrangling, exploratory data analysis, and data visualization were major parts of the analysis—we used these concepts to find and correct numerous issues like granularity, missing data points, and even duplicated data values. Finally, we applied modeling concepts to create our final correction model. We reviewed loss functions and fit two constant models to the data. And we found that linear models sufficiently reduce model error for real-world use.

At this point in the book, we encourage you to take stock of what you’ve learned thus far. Pat yourself on the back—you’ve already come a long way! The principles and techniques we’ve covered here are useful for nearly every type of data analysis, and you can readily start applying them towards analyses of your own.