11. Data Visualization#

As data scientists, we create data visualizations in order to understand our data and explain our analyses to other people. A plot should have a message, and it’s our job to communicate this message as clearly as possible.

In Chapter 10, we connected the choice of a statistical graph to the kind of data being plotted; we also introduced many standard plots and showed how to read them. In this chapter, we discuss the principles of effective data visualization that make it easier for our audience to grasp the message in our plot. We talk about how to choose scales for axes, handle large amounts of data with smoothing and aggregation, facilitate meaningful comparisons, incorporate study design, and add contextual information. We also show how to create plots using plotly, a popular package for plotting in Python.

One tricky part about writing a chapter on data visualization is that software packages for visualization change all the time, so any code we display can quickly get out of date. Because of this, some books avoid code entirely. We instead strike a balance, where we cover high-level data visualization principles that are broadly useful. Then we separately include practical plotting code to implement these principles. When new software becomes available, readers can still use our principles to guide the creation of their visualizations.