helena * jambor

scientist interested in RNA, genomics and science visualizations

Month: April, 2016

By all means: avoid 3D!

You have so much nice data you want to show, but sadly only one flat piece of paper. Are 3-dimensional graphs a good solution? Quick answer: No, never, ever. Why, I will explain and show a recent example that I worked on.

We often have the trouble of wanting (or having) to show a lot data at once: let’s say the body temperature of mice over time, RNA expression in cell differentiation. If the data points diverge (and are color-coded!) this rapidly results in a highly cluttered graph. As a consequence the audience has to really “read” the data to decide for themselve what the main message is. PLot_spagettiWe also have a problem if the data is similar and partially overlaps. Again, the resulting graph is highly unreadable.

PLot_overlapWhat to do? To avoid such overlap in data points we tend to use 3-dimensional graphs: each data series can then be read individually. However, a 3-dimensional plot create more problems than it solves:

  • A reduction that is shown further along the z-axis (green data!) is visually heightened – and consequently cannot be fully appreciated. Vice-versa, if you wanted to show an increase, it would look much more dramatic if shown in the background – both are: misleading!
  • It is almost impossible to faithfully read the value of the y-axis correctly. What is the size of the first green peak? I’d have to use a ruler to asess where the peak would cross the y-axis (3rd tick) and then substract the height of where the green baseline crosses the y-axis (0.5 ticks). Quite a lot of work! PLot_3D

Solution

Show data individually, dare to show it small, the main point will still be clear! And make use of the power of showing multiples – here the reader has to read axes only once, but can apply this knowledge to all of the individual plots at once!

Note: the resulting picture is not bigger than the orignial and could possibly be further reduced in size while still being fully readable!

Plot_result_multiples

PS. To increased clarity I mute the colors of the y-axis and gene-model and show them in grey (there is no need to show each exon in a different color!). I then use color ONLY to highlight the main message: a strong reduction of RNA expression in homozygous mutants. By separating the data into three plots I circumvent the problem of having to show them in individual colors.

Advertisements

Evolution and the hourglass

Today, 134 years ago, Darwin died. A suitable day to share a data visualization on evolution!

In the early 18-hundreds, Karl von Baer made a couple of observations that lead to what is now commonly known as Baer’s laws of embryology. These state that while embryos of various species look strikingly different in the beginning of embryo development and as adults, there is one time-point when the variation is at its minimum that typifies a phylum, the phylotypic stage. Baer’s observations were later developed further and became known as the developmental “hourglass” (Sander 1983?). *see FOOTNOTE*

The hourglass model states that there are developmental constraints that work against variation – but this lacked, as many evolutionary models, experimental validation. How should one recapitulate or test an experiment that in nature took billions of years? I fondly remember my teacher Ingo Wallat’s classes on evolution and was therefore delighted when joining Pavel Tomancak’s lab that a team around Alex Kalinka was collecting the first molecular proof for the developmental hourglass and van Baer’s 200-year old theory!

Their paper was published in 2010 (also available here), but I must admit the nature of the evidence was initially hard to grasp for a RNA biologist like myself! I therefore decided to create an illustration of their findings to explain the science to a wider audience – and maybe also high-school students!

Kalinka_2

 

 

 

* Haeckel beautifully illustrated a similar idea of his own, that embryonic development is a recapitulation of evolution. In fact, his drawings are most often used to illustrate the developmental hourglass – a great case point for the power of a wonderful scientific illustration!

Science visualization 3: Redraw Figure 1

Part 3 on “How to accentuate the figures of a scientific paper”:

Re-drawing of Figures

I, just like most scientists, have no formal training in scientific data visualization. I rely on books that are primarily written for journalists dealing with data and for people working in business. Some aspects of data visualization we learn in our statistics and mathematics courses, but how to effectively use color etc. rarely is part of the curriculum. Apart from reading, I train myself by analyzing the figures in scientific publications. For you to improve too, I have here shown for four example figures how I analyze figures and the changes I suggest to implement.

Figure1_notes

1. Why is here a line? It seems its sole purpose is to separate panels A and B. This is not necessary if enough space is left between the panels and the panel contents are clearly grouped. Solution: remove the line and integrate the labels of the schematic model (“Liquid Disordered, Ld”etc) clearly into panel B – at the moment they float into the space of panel A and are visually cut off from panel B itself! In add ition, I have integrated headers directly into the figure – by now most journals accept this!

2. Inconsistency of labels: in panel A we see structures of Cholesterol and Diplopterol but neither is mentioned in B. Solution: For consistency the relationship of cholesterol, diplopterol, sterol and hopanoids should be made clear, especially since these terms are used throughout the paper.

3. Simplify labels 1: is it necessary to explain arrows and the strike-through of this arrow separately? Solution: explain it simpler!

4. Simplify labels 2: Redundancy between schematic and legend. Solution: Integrate part of the legend into the schematic – this would reduce cluttering and increase the readability of the schematic and also of the legend itself!

5. Color choice of the lipids: It is not clear why are some head groups yellow and green? Is it really necessary to distinguish these features of the lipids by color? Solution: remove all colors on lipids that are not the focus of this study – saturated and unsaturated lipids are easily distinguished based on their strikingly different shapes!

Figure1_redo

After

Voila!

And now the same for Figures 2-4!

 

 

Science visualization 3: Redraw Figures 2-4

Part 3 on “How to accentuate the figures of a scientific paper”:

Re-drawing of Figures (2-4):

Figure 2

Figure2_notes

Before

  1. Layout: The axis is too fat, it is almost more prominent than the data. Typically, I advocate muting it by showing a thin line in grey, for example. If a legend can be placed within the chart area, most often one can simply label the data lines themselves in the corresponding color! That way it takes even less time to read the entire graph.
  2. Color-scheme: For the entire figure set, I have reserved color exclusively for the data on the hopanoid diplopterol (yellow) while the control experiments are shown in shades of grey.
  3. Gridlines: are in 99/100 cases not necessary to guide the reader through the data. However, here they are used to point to the condensation plot on the right. But this takes some effort to find out! I have solved this by unlinking the axes of the monolayer data and the condensation plot.
  4. Axes: It was not immediately obvious that the condensation plot shares the y-axis with the SM monolayer plot. I have unlined the two plots and added a new axis to the condensation plot. In addition, the error bars are very prominent and in some cases they even hide the data bar.
  5. Bar versus Boxplot: Here the median of several experiments is shown in a bar graph – this would be better shown in a boxplot. Even better, if I had had access, would have been to show the distribution of the actual data (Beyond the bargraph). Or, a more radical solution would be to just state the two numbers! Usually, a plot is not necessary when only two numbers should be compared.
  6. Rotated text: is hard to read, it is almost always worth the space to avoid it!! Here: by having two lines of text! Then one can also remove abbreviations entirely!

 Figure2_redraw_version4

After

 

Figure 3

 

Figure3_notes

Before

 

  1. Color scheme: Here, values from measuring membrane packaging are shown. This just shows valued on a single scale – hence a single color would be sufficient! And be easier to read! And even if this actually was diverging data that critically needed two colors (above/below a threshold for example), one would and should not choose a rainbow color scale. As documented in many, many, many blog posts and opinion pieces, rainbow colors do not faithfully reveal graded distributions (Rainbow color map still considered harmful!).
  2. Label clearly: new abbreviations are used, but not introduced in the figure itself – again, it is almost always worth the extra space to increase readability. And here, we have a lot of space!
  3. Cluttering: the extra line is supposed to separate figure part A from B and C. See Figure 1: if the spacing and grouping of panel and panel parts is done clearly, there is no need for a separating line.
  4. Order of panels: Figures are “read” just like a text, from left to right. Therefore panel C will be read before panel While fixing this is sometimes really tricky, in this case it is easy!
  5. Intersection x/y-axis: as a rule (with few notable exceptions), the x-axis should intersect with the y-axis at zero! Also in this panel, the weight of the axes and lines as well as the color scheme does not match to the other figures (but, in this case I lack original data and therefore could not implement changes)
  6. Interrupted axes: interruptions of any axes should best be avoided or at least motivated by the data. In this case, I think it is not necessary to do at all! The plot shows the mean GP index shown in panel A (and the same value for ordered and disordered areas). I have used grey bars to guide the eye to the mean values and reserved white background for the additional calculations of the mean of sub populations.
Figure3_redraw_version3

After

 

 Figure 4

 

Figure4_notes

Before

 

  1. Labeling of the structures could be slightly improved for clarity, especially since the names are re-used in the figure and paper.
  2. Spacing of panel parts: the spacing of charts in panel B could be improved to increase readability and I have used headers to guide the reader through the individual plots. Also, I have matched Figure 4B to the previous, similar Figure 2A.
  3. Data label/legends: as before, I have again chosen color just for the molecule of interest and mutated and homogenized the control data (here is an article on how not to mix attributes such as color, texture etc). The dotted line was visually more “active” than even the colored line showing the hopanoid data!!!
  4. Spacing: by spacing the parts of C better, the readability of the entire figure is enhanced.
  5. Legend: the legend is placed in between the two parts of C and in addition is not 100% identical to B although they should be!
Figure4_redraw

After

Science visualization: my way (2)

Part 2 on “How to accentuate the figures of a scientific paper”:

What needs work?

After getting an overview of all figures of the publication, I use pen and paper to highlight all things that are odd and need to be addressed – missing labels and legends, inconsistent color-schemes and layouts, incomplete axes and visual clutter (more: see work by Edward Tufte).

As an example, I have done this for each individual figure (Part 3) and explain my observations in the accompanying text! Send me a message if you notice more, I will follow up :).

Figure1_notes

To-do-list Figure 1

I then group my work according to the type of change (layout, font, color-scheme, etc) – this helps to increase consistency and horizontal logic and reduces the overall work-time as it prevents you from having to go back and forth multiple times.

Next, I start re-drawing the figure and play around with several solutions – and this I do strictly with pen and paper only! Two stages of this are shown below.

And only once I decided on all changes I implement them using a graphics program.

Figure1_redo.png

New figure 1 with changes implemented.

 

Science visualization: my way

How to accentuate the figures of a scientific paper (my first of series of science data visualization posts!)

After many years of grueling work in the laboratory, fighting with difficult cloning reactions, microscopy settings and Fiji plugins it’s finally time to summarize your data and produce powerful figures for a publication. Hurray! But this is a lot of work! There are 3 main challenges for your scientific visualization:

  1. Which visual display to choose?
  2. How to deliver the key message effectively?
  3. Establishing logic in your figures!

Although this describes the process in a linear way, it in reality is a rather chaotic procedure with lots of going back and forth. And all of us can learn still learn a lot of how to make most use of visual communication.Here, I will explain the steps of the entire process in little steps. As an example, I use a publication of my friend James who studies the origin of life and lipids* [FOOTNOTE ON FAT].

Part 1: Get an overview.

The ultimate goal of figures in publications is to leverage the amazing capabilities of our visual perception and allow readers to take in the data effortlessly – this requires a clear visual language that the reader can rapidly decode. The goal was to make James’ finding about the cholesterol-like role of hopanoids in bacteria more accessible.

Figures_atglance

All Figures of the paper next to each other

  • To get a quick overview, I put all figures next to each other, without explanatory text! Then, I determine how much I can already understand that way? Can I grasp the story?
  • To assess if the data is presented in a scientific sound and clear way, I check the display-types: were the right display types chosen for this type of data? Are the errors indicated, the axes labeled and intersecting each other in a useful manner?
  • Then I look at the color scheme and layout – do they guide the reader to the most important findings? Are labels and fonts consistent?
  • Last, I squint my eyes and see if there are imbalances in data presentation, too much white space, too much dark space etc.

What I notice

  • There are a couple of structures, many line charts and accompanying bar charts and some image data.
  • The orange data stands out in all figures and indeed seems to have been chosen for the key data – it is the molecule of interest to the Saenz group, the cholesterol-analog hopanoid.
  • In two parts the color scheme differs: 1. In the schematic drawing of membrane architecture and 2. in the line chart in the lower left hand corner.
  • Also, in almost each chart the bars are of different thickness and the layout of the axes changes!

My next step

I take a pen and mark every little detail in the figures that I notice as worth checking. This helps me priorities my work of the make-over and helps me stat focused! More soon!

Figure1_notes

Mark-up of things I notice a logn the way

 

Helena

PS I found that this way of engaging with a publication also works as a fantastic quick way of reviewing a paper – and you might try this approach for one of your future reviews!

 

Footnote on Fat

* more on the topic: Fat!

Fat is incredibly important for life of all forms: it prevents us from mixing with our environment and thus enables “life”. But some passage must be possible: oxygen and nutrients need to be taken up and salt levels adjusted. Therefore the permeability of the membrane, which separates inside and outside, is adjustable.

In humans membrane permeability is regulated by cholesterol. James showed that bacteria, who lack cholesterol, use hopanoid as a functional analog of cholesterol. And they might have already done so since 3.5 billion years of earth history!

Read more in his publications, for example his open-access paper in PNAS here.