Summary

2.6. Summary

No matter the kind of data you are working with, before diving into cleaning, exploration, and analysis, take a moment to look into the data’s source. Consider the scope of the data. Some questions to address are:

  • Why were the data collected?

  • What is the target population (or unknown parameter value)?

  • How was the target accessed?

  • What methods were used to select samples/take measurements?

  • What instruments were used and how were they calibrated?

Questions that consider the temporal and location aspects of data collection can also provide valuable insights about data scope.

  • When were the data collected?

  • Where were the data collected?

Answering as mamny of these questions as possible can give you valuable insights as to how much trust you can place in your findings and how far you can generalize your findings. This chapter has provided you with a terminology and framework for thinking about and answering these questions. The chapter has also outlined ways to identify possible sources of bias and variance that can impact the accuracy of your findings.

We have introduced models to help you think about how bias and variance arise.

  • Venn diagram indicating the overlap between target population, access frame, and sample;

  • Dart board for describing an instrument’s bias and variance; and

  • Urn model for examples when chance mechanism have been used to select a sample from an access frame, divide a group into experimental treatment groups, and take measurements from a well calibrated instrument.

Chapter 3 continues the development of the urn model to more formally quantify accuracy.