Summary

9.7. Summary

Data wrangling is an essential part of data analysis. Without it, we risk overlooking problems in data that can have major consequences for future analysis. This chapter covered several important data wrangling steps that we use in nearly every analysis.

We described what to look for in a dataset after we’ve read it into a dataframe. Quality checks help us spot problems in the data. Missing values are an especially important and common issue, and we provided guidelines on imputing missing values. We transform data in order to make them easier to analyze, and we talked about transformations the modify the structure of a dataframe.

We illustrated these techniques through two sections with more detailed examples of data wrangling, one on the CO2 data and one on the restaurant safety data. Together, the data wrangling techniques in this chapter prepare the data for exploratory data analysis, the topic of the next chapter.