Follow-up to: Showing distributions
When writing about the half-and-half plot, many of you replied with further discussion points, tips, and tutorials. I tried collected them here to make them available to everyone.
More mixed boxplots
Aaron Ellison @AMaxEll17 brought to our attention that he published a plot in 1993, where he overlaid the box plot with the data points (see fig 1A). Along with it he published the code, pre github et al. Aaron was inspired by the just published “Grammar of Graphics” by Wilkinson. He seems to be the first person to have published it in a paper?
Today, boxplot/data plots are common and easy to plot in R with ggplot2. Declan O’Regan @DrDeclanORegan shows us one example in figure 1B. An “exploded” version, where the boxplot and its metrics are barely visible and the focus is on the data points, is shown in figure 1C (provided by the cystic fibrosis Gene therapy group @CFGT_Edinburgh).
There is also the overlay of boxplot with the bee-swarm plot. Here, individual data points are ordered and arranged in a U-shape instead of randomly placed. An example is shown by Darren Wisniewski @Dmwizzle, who made this in ggplot2 (fig 2A).
But, beware of the bee-swarm: the ordered arrangement of the data (U-/ or A-shape most common) may introduces visual artifacts. And, personally, I draw a mental line through the U-shaped branches and straighten it to understand the data. This is error-prone and of course a waste of time when the line could equally be straight. In figure 2B I have plotted the same data as bee plot and dot plot for a direct comparison. I feel it is easier to see how the data is distributed in the data/dot plot. (Data: gene expression of RNAs that are localized at the poles in the fruit fly oocyte. RNAs that localize at the posterior for days have higher expression than RNAs at the anterior pole that are localized just for a few hours).
Histogram & boxplot
Robert Grant @robertstats pointed us to an interesting histogram overlaid with statistical summaries that was originally designed by @f2harrell (here is a link to a tutorial with R), see figure 3. The horizontal histogram shown below has particularly small bins and the median and quartiles indicated below – for my taste a bit too small.
Violin and data
Of course, there are also mixed plots with violin plots. Violin plot themselves most often already are overlaid with a boxplot. Another possibility by Wouter de Coster @wouter_decoster is to mix the violin plot with a bee swarm plot, which he implemented with python seaborn (fig 4A). As you know, I personally would have preferred the actual data instead of the bee swarm, see above.
Joey Burant @jbburant put forward the idea of mixing data points as a histogram with half of a violin plot in , see figure 4B.
Joey also nicely documented how in github:
When the histo-violin is flipped horizontal this looks like a raining cloud, Roger Kievit @rogierK therefore named it the raincloud plot and just deposited a preprint article about this plot type and its implementation. For matplotlib users Sara Popham@sara_poppop posted a guide in github.
Jorge Camoes @wisevis shows us that such plot types are also possible to make in excel – he shows us a horizontal boxplot with data points above from his book (fig 5). I generally like horizontal boxplots, especially when comparing lots of categories! Jon Schwabish @jschwabish re-created the half-and-half plot it in excel. Both are phenomenal, I had no idea excel could do this much!
… and matlab
And finally, matlab user rejoice, it is also possible to make mixed plots in your favorite environment, Matt Cooper @mattguycooper suggests to use the ‘notboxplot’ function on the file exchange that creates ‘box plots’ with dot plots overlaid, this gives you plots as shown in figure 6:
More: Tutorials and interactive plots
A couple of tutorials: Frank Soboczenski @h21k shows us the code for making half-and-half boxplots in R: https://github.com/h21k/R/blob/master/snippets/half_box.R, James Rooney @jpkrooney pointed us to a great tutorial for making violin plots with ggplot2 by Katherine Wood @kathmwood https://inattentionalcoffee.wordpress.com/2017/02/14/data-in-the-raw-violin-plots/ and @lisadebruine compares different plots compare with the same data: https://debruine.github.io/plot_comparison.html.