helena * jambor

scientist interested in RNA, genomics and science visualizations

Science visualization 3: Redraw Figure 1

Part 3 on “How to accentuate the figures of a scientific paper”:

Re-drawing of Figures

I, just like most scientists, have no formal training in scientific data visualization. I rely on books that are primarily written for journalists dealing with data and for people working in business. Some aspects of data visualization we learn in our statistics and mathematics courses, but how to effectively use color etc. rarely is part of the curriculum. Apart from reading, I train myself by analyzing the figures in scientific publications. For you to improve too, I have here shown for four example figures how I analyze figures and the changes I suggest to implement.

Figure1_notes

1. Why is here a line? It seems its sole purpose is to separate panels A and B. This is not necessary if enough space is left between the panels and the panel contents are clearly grouped. Solution: remove the line and integrate the labels of the schematic model (“Liquid Disordered, Ld”etc) clearly into panel B – at the moment they float into the space of panel A and are visually cut off from panel B itself! In add ition, I have integrated headers directly into the figure – by now most journals accept this!

2. Inconsistency of labels: in panel A we see structures of Cholesterol and Diplopterol but neither is mentioned in B. Solution: For consistency the relationship of cholesterol, diplopterol, sterol and hopanoids should be made clear, especially since these terms are used throughout the paper.

3. Simplify labels 1: is it necessary to explain arrows and the strike-through of this arrow separately? Solution: explain it simpler!

4. Simplify labels 2: Redundancy between schematic and legend. Solution: Integrate part of the legend into the schematic – this would reduce cluttering and increase the readability of the schematic and also of the legend itself!

5. Color choice of the lipids: It is not clear why are some head groups yellow and green? Is it really necessary to distinguish these features of the lipids by color? Solution: remove all colors on lipids that are not the focus of this study – saturated and unsaturated lipids are easily distinguished based on their strikingly different shapes!

Figure1_redo

After

Voila!

And now the same for Figures 2-4!

 

 

Science visualization 3: Redraw Figures 2-4

Part 3 on “How to accentuate the figures of a scientific paper”:

Re-drawing of Figures (2-4):

Figure 2

Figure2_notes

Before

  1. Layout: The axis is too fat, it is almost more prominent than the data. Typically, I advocate muting it by showing a thin line in grey, for example. If a legend can be placed within the chart area, most often one can simply label the data lines themselves in the corresponding color! That way it takes even less time to read the entire graph.
  2. Color-scheme: For the entire figure set, I have reserved color exclusively for the data on the hopanoid diplopterol (yellow) while the control experiments are shown in shades of grey.
  3. Gridlines: are in 99/100 cases not necessary to guide the reader through the data. However, here they are used to point to the condensation plot on the right. But this takes some effort to find out! I have solved this by unlinking the axes of the monolayer data and the condensation plot.
  4. Axes: It was not immediately obvious that the condensation plot shares the y-axis with the SM monolayer plot. I have unlined the two plots and added a new axis to the condensation plot. In addition, the error bars are very prominent and in some cases they even hide the data bar.
  5. Bar versus Boxplot: Here the median of several experiments is shown in a bar graph – this would be better shown in a boxplot. Even better, if I had had access, would have been to show the distribution of the actual data (Beyond the bargraph). Or, a more radical solution would be to just state the two numbers! Usually, a plot is not necessary when only two numbers should be compared.
  6. Rotated text: is hard to read, it is almost always worth the space to avoid it!! Here: by having two lines of text! Then one can also remove abbreviations entirely!

 Figure2_redraw_version4

After

 

Figure 3

 

Figure3_notes

Before

 

  1. Color scheme: Here, values from measuring membrane packaging are shown. This just shows valued on a single scale – hence a single color would be sufficient! And be easier to read! And even if this actually was diverging data that critically needed two colors (above/below a threshold for example), one would and should not choose a rainbow color scale. As documented in many, many, many blog posts and opinion pieces, rainbow colors do not faithfully reveal graded distributions (Rainbow color map still considered harmful!).
  2. Label clearly: new abbreviations are used, but not introduced in the figure itself – again, it is almost always worth the extra space to increase readability. And here, we have a lot of space!
  3. Cluttering: the extra line is supposed to separate figure part A from B and C. See Figure 1: if the spacing and grouping of panel and panel parts is done clearly, there is no need for a separating line.
  4. Order of panels: Figures are “read” just like a text, from left to right. Therefore panel C will be read before panel While fixing this is sometimes really tricky, in this case it is easy!
  5. Intersection x/y-axis: as a rule (with few notable exceptions), the x-axis should intersect with the y-axis at zero! Also in this panel, the weight of the axes and lines as well as the color scheme does not match to the other figures (but, in this case I lack original data and therefore could not implement changes)
  6. Interrupted axes: interruptions of any axes should best be avoided or at least motivated by the data. In this case, I think it is not necessary to do at all! The plot shows the mean GP index shown in panel A (and the same value for ordered and disordered areas). I have used grey bars to guide the eye to the mean values and reserved white background for the additional calculations of the mean of sub populations.
Figure3_redraw_version3

After

 

 Figure 4

 

Figure4_notes

Before

 

  1. Labeling of the structures could be slightly improved for clarity, especially since the names are re-used in the figure and paper.
  2. Spacing of panel parts: the spacing of charts in panel B could be improved to increase readability and I have used headers to guide the reader through the individual plots. Also, I have matched Figure 4B to the previous, similar Figure 2A.
  3. Data label/legends: as before, I have again chosen color just for the molecule of interest and mutated and homogenized the control data (here is an article on how not to mix attributes such as color, texture etc). The dotted line was visually more “active” than even the colored line showing the hopanoid data!!!
  4. Spacing: by spacing the parts of C better, the readability of the entire figure is enhanced.
  5. Legend: the legend is placed in between the two parts of C and in addition is not 100% identical to B although they should be!
Figure4_redraw

After

Science visualization: my way (2)

Part 2 on “How to accentuate the figures of a scientific paper”:

What needs work?

After getting an overview of all figures of the publication, I use pen and paper to highlight all things that are odd and need to be addressed – missing labels and legends, inconsistent color-schemes and layouts, incomplete axes and visual clutter (more: see work by Edward Tufte).

As an example, I have done this for each individual figure (Part 3) and explain my observations in the accompanying text! Send me a message if you notice more, I will follow up :).

Figure1_notes

To-do-list Figure 1

I then group my work according to the type of change (layout, font, color-scheme, etc) – this helps to increase consistency and horizontal logic and reduces the overall work-time as it prevents you from having to go back and forth multiple times.

Next, I start re-drawing the figure and play around with several solutions – and this I do strictly with pen and paper only! Two stages of this are shown below.

And only once I decided on all changes I implement them using a graphics program.

Figure1_redo.png

New figure 1 with changes implemented.

 

Science visualization: my way

How to accentuate the figures of a scientific paper (my first of series of science data visualization posts!)

After many years of grueling work in the laboratory, fighting with difficult cloning reactions, microscopy settings and Fiji plugins it’s finally time to summarize your data and produce powerful figures for a publication. Hurray! But this is a lot of work! There are 3 main challenges for your scientific visualization:

  1. Which visual display to choose?
  2. How to deliver the key message effectively?
  3. Establishing logic in your figures!

Although this describes the process in a linear way, it in reality is a rather chaotic procedure with lots of going back and forth. And all of us can learn still learn a lot of how to make most use of visual communication.Here, I will explain the steps of the entire process in little steps. As an example, I use a publication of my friend James who studies the origin of life and lipids* [FOOTNOTE ON FAT].

Part 1: Get an overview.

The ultimate goal of figures in publications is to leverage the amazing capabilities of our visual perception and allow readers to take in the data effortlessly – this requires a clear visual language that the reader can rapidly decode. The goal was to make James’ finding about the cholesterol-like role of hopanoids in bacteria more accessible.

Figures_atglance

All Figures of the paper next to each other

  • To get a quick overview, I put all figures next to each other, without explanatory text! Then, I determine how much I can already understand that way? Can I grasp the story?
  • To assess if the data is presented in a scientific sound and clear way, I check the display-types: were the right display types chosen for this type of data? Are the errors indicated, the axes labeled and intersecting each other in a useful manner?
  • Then I look at the color scheme and layout – do they guide the reader to the most important findings? Are labels and fonts consistent?
  • Last, I squint my eyes and see if there are imbalances in data presentation, too much white space, too much dark space etc.

What I notice

  • There are a couple of structures, many line charts and accompanying bar charts and some image data.
  • The orange data stands out in all figures and indeed seems to have been chosen for the key data – it is the molecule of interest to the Saenz group, the cholesterol-analog hopanoid.
  • In two parts the color scheme differs: 1. In the schematic drawing of membrane architecture and 2. in the line chart in the lower left hand corner.
  • Also, in almost each chart the bars are of different thickness and the layout of the axes changes!

My next step

I take a pen and mark every little detail in the figures that I notice as worth checking. This helps me priorities my work of the make-over and helps me stat focused! More soon!

Figure1_notes

Mark-up of things I notice a logn the way

 

Helena

PS I found that this way of engaging with a publication also works as a fantastic quick way of reviewing a paper – and you might try this approach for one of your future reviews!

 

Footnote on Fat

* more on the topic: Fat!

Fat is incredibly important for life of all forms: it prevents us from mixing with our environment and thus enables “life”. But some passage must be possible: oxygen and nutrients need to be taken up and salt levels adjusted. Therefore the permeability of the membrane, which separates inside and outside, is adjustable.

In humans membrane permeability is regulated by cholesterol. James showed that bacteria, who lack cholesterol, use hopanoid as a functional analog of cholesterol. And they might have already done so since 3.5 billion years of earth history!

Read more in his publications, for example his open-access paper in PNAS here.

 

Top 10 signs RNA is awesome

(sort of reply to Raj lab)

A buzzfeed like list of “top 10 signs that a field is bogus” guarantees a fun read, makes audiences crack up and all of us agree with at least one point. But, sometimes it hits close to home. And in this case not even home, but right into my heart, touching on my one big love, the RNA. And RNA localization even scored the number one spot! To illustrate the cellular role of RNA localization and more generally describe the importance of RNAs I have compiled the Top 10 Signs that RNAs and their localizations are awesome!

In short:

1. Degree of RNA localization is not overrated but essentially unknown!

2. RNAs not always randomly distributed!

3. Why should RNAs be localized?

  • For starters, diffusion is in fact very limited!
  • Then also: RNAs don’t diffuse well at all!
  • And: the cytoplasm is not water!
  • Cells are not small and round.
  • Are all localized RNAs encoding localized proteins?
  • Localized RNAs = silenced pool?
  • Localized RNAs = not translated?
  • Evolution!

 

  1. Degree of RNA localization is not overrated but essentially unknown!

To assess if RNA localization is overrated or underappreciated will require a good number of studies. And RNA localization has simply not been studied very much at genome-scale. Most genome-wide studies were performed on neuron that have clear polarity and extensions that can be separated from the cell body for RNA isolation/sequencing. Less is known for non-neuronal cells; In Drosophila embryos a total number of up to 70% of RNAs can localize – yet if this number seems overrated remember it is a sum of the entire embryogenesis and all cell types present in an embryo. In adult tissue such as the ovary we saw a great degree of variability: the absolute numbers of localized RNAs varied from 0.1 to 10% of the expressed transcripts per cell type. But, more importantly, the percentage of localized RNAs also varied in one cell over time suggesting that RNA distributions are context specific: we therefore suggest to categorize RNAs as ubiquitous versus localization-competent – these RNAs can enrich subcellularly but are not always localized.

The reason why genome-wide analyses of RNA distributions are rare is simply the huge amount of work each one still takes – for most cell types cutting off pieces does not work, cell fractionation to retrieve subcellular fractions is notoriously erroneous and “standard” in situ hybridization at genome scale is a lot of work. Even for single genes in situ hybridization often go wrong, are often done with improper probes (too long, not clean), old-fashioned detection method (NBT, BCIP) that don’t allow subcellular resolution etc.

Two things need to happen to get us ahead in the field: we need probes for assessing RNA distributions in living tissues and we need topological sequencing methods. Both methods are being currently developed, but its still early days to say if they are the breakthrough. So in my book, time will tell how widespread RNA localization really is. Until then lets postpone discussion of numbers – in the end does it make a difference if it is “only” 5% of RNAs? That is still a lot of transcripts! It will be much more interesting how localization-competent RNAs are regulated over time and in space!

  1. Are RNAs randomly distributed?

Well, we already know that many are not randomly distributed. And during my screen I often observed that RNAs encoding a known localized proteins, were also localized. In many instances (references upon request 😉 the authors had reported the RNA to be ubiquitous using a less sensitive approach.

RNAs also change their subcellular distribution as many others and we reported. They change localization over time, under stress, when the cell undergoes other dramatic changes that also result in global changes of cellular organization such as entering the cell cycle, becoming migratory or by viral infections.

Whether one can see RNAs in their localized states depends thus on a number of factors: the right detection method and integrity of the probes used but also on cell type. All this is important for if the RNA of interest is constitutively localized and even more important if the RNA is one of the localization-competent RNAs that have dual states!

  1. Why should RNAs be localized?

For starters, diffusion is in fact very limited!

I also like Bionumbers a lot, just got the fantastic book, and it gives you the answer! While diffusion works really well at the scale of bacterial cells, its effectiveness rapidly declines with an increase in cell size: doubling of the distance results in four times the diffusion time.

In addition, diffusion is not equal in all cell types: macromolecular crowding, large immobile protein structures, and interactions with other molecules influence diffusion. And finally, molecules themselves do not all have equal diffusibility: this depends on protein type, size, if it is in heavy particles etc. While a GFP molecule can traverse a eukaryotic cell in as little as 1 second, for cellular proteins this is much slower: even small proteins like transcription factors already require 3-30 seconds. The larger the protein gets and the more interactions it has with other proteins, the lower its diffusion coefficient becomes. For example it would take a ribosomes ~8 minutes to cross a cell!

  1. Then also: RNAs don’t diffuse well at all!

First of all, RNAs are big! By definition, already the open reading frame is 3 times longer than the protein they encode for, but they have additionally 5’UTR and 3’UTR and introns and long Poly(A)tails. The length of an unwound 1kb RNA in the cell is 300nm! And even in Drosophila 1kb is just the length of the UTR, in vertebrates they are much longer! Then this beast is highly negatively charged, i.e. likely tons of interactions are inhibiting its diffusibility. Then to overcome the charge, they are covered by spermidine, polyamines, proteins and what not – each molecule making the RNA less likely to diffuse fast. And even though proteins and amines are small and bundle the large RNA up – in the end it still has a four times bigger spherical expansion than the proteins.

  1. And: the cytoplasm is not water!

Diffusion is fast in water, but alas, the cytoplasm does not resemble water much. It is heavily crowded making it really hard for any molecule larger than a GFP to just randomly move around. In addition, recent papers suggest that the cytoplasm under starvation, during the cell cycle and in changing pH etc can “freeze” and become a gel. (Search glass-like cytoplasm and any paper from Simon Alberti lab!)

  1. Cells are not small and round.

While we like to think of cells as little round balls as they appear in cell culture and in textbooks, they in fact are hardly ever round. Most cells in tissues are polarized, they have extension, filopodia, asymmetries, form extensive surface interactions and protrusion, bulges… Even cells that in old microscopes appeared round, apparently look almost like neurons when observed with higher resolution! The role of RNA localization for establishing and maintain such highly polarized structures in neurons is well established and could easily be more widely used (but to show this more people would need to work on it! We don’t have much data on cross-tissue comparisons of mRNA localization).

  1. Are all localized RNAs encoding localized proteins?

Probably not. Do we know for sure? No, so far we have not one good dataset globally comparing RNA and protein localizations (coming, provided I get the funding!). Even if there was little correlation between RNA localization and protein distribution: that could be interesting too and we could understand more about the diverse roles of RNA in cells! Co-localization could enable complex formation, facilitate reactions, or serve as a backup mechanism for protein localization: for oskar RNA over the years more localization steps were discovered that individually were not critical in sum ensured germ cells could form (arguably, germ cell formation might be a more backed-up mechanism than RNA localization in somatic cells).

  1. Localized RNAs = silenced pool?

The localized state of RNA could also be a mechanism for translational silencing – similar to sequestration of RNAs into sponge/nuage/P-body type RNA-protein complexes. Do we know? No, again we have no genome-wide data. But most localized RNAs that have been studied in great detail so far are also under translational control at least for a period of their lifetime.

  1. Localized RNAs = not translated?

One exciting possibility is also that RNAs could have dual roles – protein coding and a structural role. This is in fact the case for oskar RNA in the fly: its early, 5-day long localization has nothing to do with encoding the protein, but is absolutely necessary for survival of the oocyte. In fact, for the early stages only ~100 nucleotides of the UTR are necessary – but they need to be localized!!!

  1. And finally, RNAs are localized in all life forms, algae, bacteria, yeast, many cell types, and also, RNA world… evolution, duh!

 

You see, RNAs are great and good for many things in cells! I look forward to a chance to discuss in much more detail over beer! In the end, we all agree bogus science is science that is crappily done, but no field itself is pointless to pursue – only time can tell what impact it will have.

 

 

Disclaimer: this list most likely is not complete! Am happy to update my list any time!