When reading about co-evolution of prey and predators, I stumbled across a cute new plot type: a half boxplot, half dot plot. Wilson et al. created this plot to simultaneously visualize both the summaries of their data (center, spread) and the actual data points. This allows us, the audience, to learn a lot about their results: that cheetahs are maybe binomially distributed and have outliers, or that zebras show a curious clustering.

Your quick guide to distribution plots
The half-and-half, aka dox-plot (a friend), led me to explore which visuals are commonly used for showing distributions.
Raw data
To show how the raw data is distributed, we simply use dots or bars (as in barcode plots). When there are overlapping data points, we can use transparent fill colors or “jittering”, which distributes the data points in a given area, for increased clarity.
Summarizing the data: center and spread
Often, we are interested in summarizing statistics to judge and compare data. By convention the center (median) is indicated with a horizontal bar and the spread (variance, standard deviation) with vertical whiskers. Common plot types for this are the “star-wars rebel fighter”, the dot plot and boxplot. Bar plot used to be widely used, but are now banned by most journals for concealing most relevant information, so they are here only for completeness (see previous post). Very often these days I see boxplots that are overlaid with the data points – this works really well for up to 100 data points and is easy to implement with most software.
Show shape and unusual features
For normally distributed data, the center and the spread are highly informative. However, in life science we often have bimodal distributions, clusters, or gaps. Then boxplots become very insufficient and might even conceal interesting aspects (if not outright be misleading).
For faithfully showing distributions, histograms have a long history. Here, one has to be very careful with choosing bin sizes: too large or too small bins can greatly distort the histogram shape and result in a misleading chart. Choosing bin sizes is a science in itself, for details see wikipedia – but basically, it again depends on the data shape and sampling depth.
An alternative to histograms are density plots. They show only how the data are distributed, but in life science are not frequently used yet. They become very useful for large data sets that can’t be visualized as raw data points anymore, or where the eye would not be able comprehend the spread intuitively anymore. A rather recent but so far “happy marriage” is the violin plot. Violin plots are a fusion of the boxplot and its summary statistics, with the density/shape of the data (Hintze and Nelson, 1998 doi: 10.1080/00031305.1998.10480559).
Pro and cons of different plot types:
All figures as one:
How about beeswarms?
LikeLike
Hi Dorian,
bee-swarm plots are basically the same as plotting the original data. However; the original data is plotted in a misleading manner: the data-points are not aligned horizontal as they should be, so visually the dots at the distal ends appear to be a higher y-value than they actually have! Here is a paper about this problem written by Leland Wilkinson: http://moderngraphics11.pbworks.com/f/wilkinson_1999.DotPlots.pdf
In summary: bee-warm is better to be avoided!
Best,
Helena
LikeLike
Imho that depends on the rendering. In some (more classical) implementations the horizontal alignment is indeed not correct, and those should be avoided. However, in others like https://github.com/eclarke/ggbeeswarm#geom_quasirandom alignment is preserved. When used correctly I also like beeswarms to complement box or violin plots as they show the raw data but prevent overplotting (without alpha or jittering) by mimicking the shape of the distribution.
LikeLike
Really useful, and a great summary! Out of interest, what software programs would you recommend for generating these kind of charts? (both licensed and freeware) Any advice much appreciated…
LikeLike
Hi Brooke, I personally only use R for such plots. Excel might work for most too, I heard it also is capable of doing boxplots but I have not found out yet. For ditributions I am not aware that they work with excel at all. Greetings!
LikeLike
Maybe interesting for you + others is the course on plotting distributions by Hugo Browne, including code for R!
https://hugobowne.github.io/Workshops/Workshop_I.html
LikeLike
Great, thanks! That’s a big help. I need to wean myself off Excel…
LikeLike