1. The Data Science Lifecycle

In data science, we use large and diverse data sets to make conclusions about the world. In this book we discuss principles and techniques of data science through the two lens of computational and inferential thinking. Practically speaking, this involves the following process:

  1. Formulating a question or problem

  2. Acquiring and cleaning data

  3. Conducting exploratory data analysis

  4. Using prediction and inference to draw conclusions

It is quite common for more questions and problems to emerge after the last step of this process, so we repeatedly engage in this procedure to discover new characteristics of our data. This positive feedback loop is so central to our work that we call it the data science lifecycle.

While simple to state, the data science lifecycle takes training and practice to do well. In fact, each topic in this book revolves around a piece of this lifecycle. We think learning to do data science is both challenging and rewarding – we’ll show you by starting with an example.