Obtaining Data

1.2. Obtaining Data

In this step of the data science lifecycle, we obtain our data and understand how the data were collected. One of our goals in this stage is to understand what kinds of research questions we can answer using the data that we have. In our lifecycle, data analyses can begin with asking a question (the previous stage) or with obtaining data (this stage). When data are expensive and hard to gather, we define a precise research question first and then collect the exact data we need to answer the question. Other times, data are cheap and easily accessed. This is especially true for online data sources. For example, the Twitter website lets people quickly download millions of data points 1. When data are plentiful, we can also start an analysis by obtaining data, exploring it, and then asking research questions.

When we obtain data, we write down how the data were collected and what information the data contain. This isn’t just for bookkeeping—the type of research questions we can answer depend greatly on the way the data were collected. We explore this topic in Chapter 2.