Data Visualization

11. Data Visualization

As data scientists, we create data visualizations in order to understand our data and explain our analyses to other people. Every plot has a message. And it’s our job to use plots to communicate this message as clearly as possible.

In Chapter 10, we connected the choice of a statistical graph to the kind of data being plotted, and we introduced many of the standard plots and showed how to read them. In this chapter, we’ll discuss principles of effective data visualization to make it easier for your audience to grasp the message in your plot. We specifically talk about how to: choose scales for axes, handle large amounts of data with smoothing and aggregation, facilitate meaningful comparisons, incorporate the study design, and add contextual information. We’ll also show how to create plots in Python using plotly, a popular package for plotting in Python.

One tricky part about writing a chapter on data visualization is that software packages for visualization change all the time, so any code we display can quickly get out-of-date. Because of this, some books avoid code entirely. We instead strike a balance between the two. We cover high-level data visualization principles that are broadly useful, and separately include practical plotting code to implement these principles. When new software becomes available, readers can still use our principles to guide their analyses.