One of the principles of data visualization that I most appreciate is: avoid clutter.
Edward Tufte famously coined the term chartjunk:
The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies — to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exercise artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk.
Follow data viz experts and one of the main messages you’ll hear over and over is: reduce clutter, focus the viewer’s attention on the main point that you want them to take from the figure.
Ann Emery, giving an example of making a figure more compelling and useful: “I would remove a lot of unnecessary ink. I would delete the logo and the redundant sentences and remove the gray background shading. I would also remove the 3D effect, which was distorting the data—making the columns look taller or smaller than they really were.”
Stephanie Evergreen (along with Ann Emery) in their dataviz checklist: “Focus attention by removing the redundancy. For example, in line charts, label every other year on an axis. Do not add numeric labels *and* use a y-axis scale, since this is redundant.”
The problem is, people are used to ineffective visualization. They hold certain principles to be true because they’ve seen them over and over, not because they have any basis in research on the effectiveness of data visualization. Or, as Tufte would put it, they assume that extra details add precision and rigor when they really just distract from the main message of the figure.
In my work doing data visualization, I constantly run up against my clients’ desire to label every single point.
Yes: Every. Single. Point.
I send them a clean figure like this (note it is similar to something I made recently, but with fake data):
And they reply, “Yes, but can you add labels for all 25 years?” Ugh. Labeling every single point would create a cluttered mess. More importantly, this figure shows the overall trend over 25 years. Adding labels would take away from this, focusing the viewer’s attention on each individual label.
Here’s a solution that can let you keep the design minimal and effective while also appeasing readers who want to see every single point labelled: use interactivity. Check out this figure below.
Looks pretty similar to the first one, right? Well, it is, except for one key difference. If you hover over any point, there is a tooltip that gives the exact retention rate in any given year. This allows the graph to be presented in a way that keeps the focus on the forest while allowing those who care to examine the trees.
Using interactivity here is not just a way to report research and evaluation results that is shiny and new. It’s also effective. You can keep the figure that shows the overall trends while also letting those who care to dig into the data more deeply.
Using interactivity here is not just a way to report research and evaluation results that is shiny and new. It’s also effective.
Curious how I made the interactive figure? I used an R package called ggiraph, which lets you take figures made in ggplot and make them interactive with just a couple lines of code. I’ve posted the code I used to make all figures above here.
If R isn’t your thing, I highly recommend checking out Datawrapper (see this post I wrote about using Datawrapper) and Flourish. With both tools, you can easily generate interactive figures that you can embed on any website.