Recently a kickstarter project raised more than 3000 EUR in one month to campaign for banning all wrong usage of bar plots in scientific journals. This demonstrates two important points: a lot of the plots in scientific journals are
somewhat misleading, and a growing number of people feel very uneasy about this!
What exactly is wrong about bar plots? Nothing per se, but everything goes wrong if you use a bar plot for statistical data – this kind of plot species is also infamous as the “dynamite plot”. We are talking about the famous vertical or horizontal boxes that often come in a dazzling array of colors or patterns, with big fat black outlines and overly prominent error bars.
Are they common? Very much so! My personal survey [i] of “dynamite plots” in scientific journals revealed that on average 30-60% of articles use them in journals covering a wide range of subjects that include physics, meteorology or psychology where authors typically have rigorous training in applied mathematics. The prevalence of dynamite plots increases as we go towards more life science journals, where 50- 70% of articles are accompanied by a dynamite plot showing a statistical summary [ii].
Most of us are completely accustomed to dynamite plots and happily use them, that is, until we see the light. From then on it is impossible to not hate them! Because it is so obvious they are misleading and make reading of the data just harder than necessary! And, as scientists, we aim for clarity and getting information across concisely!
The top reasons to avoid dynamite plots
- They hide the real distribution of the data. Do all samples cluster closely? Do they form two groups? Or is there one drastic outlier? Generally, we assume a normal distribution of the data around the mean where there might not be one! In my survey of dynamite plots per journal they were more or less normally distributed.
- They hide the sample size. From the bar plot you would not have known that I probed one issue of Nature, two issues of Cell and four issues of Development! But for judging scientific data knowledge of the sample size is essential for a proper evaluation of the data! Too often we have to search for the n in axis labeling, figure text, the results, or the methods section to finally find this information. And sometimes it is omitted entirely. A clear understanding of sample size in my opinion is also critical for the review process of a paper and should be demanded by the reviewers! Not showing data, or only showing summary data, should be treated equally to cropping Western blot bands!
- Many different distributions of data can lead to the very Bar! See the Anscombe quartet. Bar plots are not intended to show statistic distributions, they are for absolute numbers. By plotting the real data we also learn more about the biology!
Not quite convinced? Seeing is believing, check out this figure:
For further information watch the video of the kickstarter campaign (British accent and humor alert!) – ideally with your entire lab and a discussion of this seminal paper on wrong usage of bar charts and this survey of their prevalence in biomedical journals!
Practical advice to avoid dynamite plots
- Plot charts with statistical programing tool R. You have to either learn it, or be really nice to someone who knows it – if your PhD requires 3 boxplots, maybe invest in a friendly relationship with the bioinformatic geek in your department, a couple of coffees go a long way!
- Learn how to make box plots in excel! (Here and here is how, but its a bit tedious).
- Can’t be bothered to do either? Use one of the available web tools such as the boxplot maker from the Tyer’s lab or the plot generator from the University of Belgrade.
[i] I probed the top10-articles of Nature in July, the three most recent volumes of Science (August), four issues of Development (Vol 138, 1:3-2011 and Jan 2016), and two issues of Cell journal from 2016 (Jan and August). I was very relaxed and gave the benefit of doubt when I wasn’t sure. But I was rigorous when authors mixed right and wrong usage of bar plots. How does this even happen? Mix of co-authors and some know better than others?
[ii] Disclaimer: this does not mean the other articles have great figure design! I saw multiple uses of 3-dimensional pie charts, rainbow color schemes, other instances of unintentional usage of color, incomprehensible spider graphs and 3-dimensional heat maps! Maybe I will devote another blog post to those.