helena * jambor

scientist interested in RNA, genomics and science visualizations

How big is an ribonucleic acid*?

I am often surprised about the real dimensions of biological entities versus how they are shown in textbooks and scientific illustrations and this is very striking for ribonucleic acid (RNA). Ribonucleic acids themselves are not photogenic as they move and wiggle, and in textbooks are shown as short strands bound by 1-2 proteins. Not really – ribonucleic acids are bundled up, associate with hundreds of proteins, cations, and other small molecules, and have a higher spherical dimension than proteins.

Quizz time! What is your guess for the physical length of a “typical human ribonucleic acid*” (let’s say 2-5kilobases)? Don’t look it up! Draw it on the image below in relation to a human egg, a skin cell, a yeast, bacterium, or viral capsid & send to [hjambor – at – gmail.com], I’ll include it in the collection below. Or just post your guess in micro-, nano, or picometers in the comments!

 

Answer:

……

 

……

 

……

A single nucleotide, which is the smallest building block, spans 3.4 Angstrom, or 340 picometers, or 0.3 nanometers. Three nucleotides encode one amino acid in a protein, therefore ribonucleic acids* three times longer than the respective protein. In addition, ribonucleic acids have many nucleotides that only serve regulatory purposes, they help with or block protein translation, or they influence  stability and degradation.

Screen Shot 2018-03-04 at 21.45.50

The average yeast ribonucleic acid is ~1500 nucleotides (Miura, BMC Genomics, 2008), which adds up to a whopping 510 nanometers, or 0.5 micrometers, spanning a good portion of the length of the entire budding yeast itself!

The average human ribonucleic acid molecule is 2000 to 6000 nucleotides, resulting in a physical length of 0.7 to 2 micrometers (Strachan and Read, 1999, Human molecular genetics). This is after a process called splicing, which removes about 60-80% of the nucleotides before a protein is even made from it. Before splicing, right when they are transcribed from the DNA template, human ribonucleic acids are 3-5 micrometers long – that is longer than a virus capsid, a bacterial cell, a yeast cell, and even larger than the diameter of the nucleus it is transcribed in! These are just averages, the longest human ribonucleic acids measure 100 (Titin) and even 600 micrometers (caspr2). To fit inside a cells, and the nucleus of a cell, ribonucleic acids curl up and are compacted. And even in the cytoplasm, where they are shorter, ribonucleic acids take up a lot of space – on average about half of the genes are transcribed at any given time point, and typically each ribonucleic acid is present in multiple copies.

Now compare your guess to the answers I got from molecular biologists – their replies varied from 10 nanometers to 100 micrometers! Mind you, my own guess was far off as well, and that after having worked with localized ribonucleic acids for over 10 years!

1_

What do we learn? Biological entities cover 10 magnitudes of scales, therefore faithful representations of size is neither possible nor expected in illustrations that merely symbolize information. On the other hand, our visual memory is pretty good – once we saw information as a picture, we tend to believe it. By memorizing false relative scales, we may thus loose an important information that may help us interpret research data.

* For the enthusiast: I mean messenger ribonucleic acids (mRNAs), the class that encodes proteins. These are generally longer than other categories of RNAs that do not encode proteins, such as rRNAs and tRNAs, miRNAs, and piRNAs.

 

Advertisements

Conformation of the insulin receptor

A few days back, my fellow CNV grantee Theresia Gutmann from the Coskun lab casually told me over dinner about her PhD work. In collaboration with the Rockefeller University NYC, Theresia had visualized the changing conformation of the human insulin receptor upon insulin binding (paper). Having just started at the Center for Regenerative Therapies Dresden with its focus on Diabetes, I could not believe that this had not been done before! To honor her achievement, I made a #sketchnote of the discovery and a GIF explaining insulin in our body (below).

theresia_new.pngInsulin:insulin_6

Paper: Gutmann, Kim et al. (2018): Visualization of ligand-induced transmembrane signaling in the full-length human insulin receptor. Journal of Cell Biology, DOI: 10.1083/jcb.201711047

 

 

Real viz coming soon, today status: tired!

I have a couple of thing I want to prepare and show, but we, as many in Germany, are submitting a DFG-excellence strategy grant next week. It’s a giant project, for seven years, with many many player’s and a lot of coordination, politics and details… To relax in the evening, i do what I always did to calm myself, drawing!

More information design soon!

Mom, can you draw a unicorn for me…

img_7807.jpg

Stop drawing me mom!

img_7804.jpg

How big is stuff in biology?

It is easy for everyone, already from kindergarten age on, to judge and compare sizes and lengths. Which lollipop is biggest, that the Eiffel tower is tall, and that matchbox cars are smaller than real ones. But it is rather difficult to understand sizes at macroscopic and microscopic scale, because we never get to see it with the unaided eye, and most of us just see images taken by others.

I probably read hundreds and hundreds of times that a cell is around 20um; I vaguely remember that many bacteria are 1/10th of that size because one magnitude difference is easy to remember. But how much bigger a cell is than a virus, and how much smaller in relative terms than my finger, I read up on again and again.

To help myself, I started drawing the relative sizes of various biological entities that I am fascinated with. Myself (here: my thumb), a fruit fly (my model organism in research for 10 years), eggs of various sizes, cells and my beloved ribosome, a wonderful machine made of many proteins and importantly, RNA that exists in every organism.

Screen Shot 2018-01-25 at 22.14.01Screen Shot 2018-01-25 at 22.13.50Screen Shot 2018-01-25 at 22.13.414_yeast-bacteria-hivScreen Shot 2018-01-25 at 22.13.06Screen Shot 2018-01-25 at 22.11.44Screen Shot 2018-01-25 at 22.10.49

While making the drawings and looking up sizes, I was once more mesmerized to re- discover that a membrane lipid is not that much bigger than a water molecule! And that a human egg, which itself is 10 times larger than an “average cell” is almost visible by eye! Also, consider this: cells come is vastly different sizes, the longest cell in the human body is around one meter long, while the smallest is around 10um. In other words, cells can vary in size over five magnitudes, from 10 to 1 000 000um! That means, if you think of the smallest cell as a tennis ball, the largest would be in comparison as tall as the Mount Everest (and, their nucleus is still the same size…)!

Have fun looking through the comparisons! A beautiful inspiration is here.

PS Also take note how one can use both relative size and scale bars for showing the size of an object! Please, never ever forget to add scale bars to your images, they are the only clue that allows your audience to relate the content to reality!

This slideshow requires JavaScript.

Visualize calendar data in R

Last week, I visualized the days I did sports in 2017 by hand with illustrator. Most of the time, we want however re-tell similar data (I am not giving up sports anytime soon!), so I always look for ways to create visualizations computationally, for example in R. Therefore, today, I show you how to make a viz of data on days of a year in R.

First step: googling (or: duckduck-ing) to find a package other people use for this type of visualization. To my surprise, this took a very long time! Apparently, there is no default package in R that can visualize calendar data!? I found a very laborious solution that someone made with ggplot (referenced here: https://www.r-bloggers.com/ggplot2-time-series-heatmaps/), but it was > 10 lines of code. I then stumbled upon another package, made specifically for visualizing pollutants in air (!), but it works also for other data and is straightforward to use. openair” takes any dataframe with a “date”-column in the standard format (YYYY-MM-DD) and plots whatever you define as the “pollutant”. In my case, the days I did sports were the “pollutant”.

3 easy steps:

  1. Open your data in R, I called my dataframe “sports”.
  2. Then load library(openair)
  3. Plot: calendarPlot(sports, pollutant = “Sports”, year = 2017)

Voila!

Red: days I did sports, yellow: lazy days, white: sick days.

Screen Shot 2018-01-14 at 10.35.29

CalendarPlot() takes a lot more arguments, so you can adjust the colorscheme, labelling of the days and so forth.

Documentation:

http://www.openair-project.org/Downloads/Default.aspx

https://cran.r-project.org/web/packages/openair/index.html

 

 

A New Year’s resolution

*** UPDATE: below! *******

At some point in life, one has to start with sports to stay healthy. On my new job at the CRTD I learned that the brain cells increase with sports and that bones stay strong when close to strong muscles (they actually get signaling molecules telling them to stay young!). In the past years I also had my share of mental challenges, for which the positive influence of sports is widely known. My 2017 New year’s resolution was therefore to do sports as often as possible.

As I love data and data visualizations, I tracked my progress daily. You can see that I steadily increased the number of days with sports to a whopping 90% in July! In January and February, before I started the diary, it was well below half of the days! It is easier to run, swim etc. in summer, and I could not keep this up in fall. But I am very pleased to see that I am still doing sports 2/3 of days a month now. It was difficult keeping it up while traveling and when I had evening appointments (November, our visit of the President of Germany), but even in hotel rooms doing planks for 10 minutes is feasible. Most of the days without sports were when I had visitors!

myYear3

I also tracked exercise time, the kind of sports I did, and other aspects of my life such as my mood (hint: boring dataset, mainly correlates with female cycle!), my food and my alcohol intake. I chose to visualize the alcohol intake alongside here. Interestingly, there is no correlation between sports and alcohol. I do not drink on those days that I feel too miserable for sports. Some days I drank a sip (light grey, a small sherry or so) after my sports, some days I neither drank nor did sports.

My resolution for 2018: visualize data every day, and as often as possible blog about it. To start, here is the making off of this chart. Since I use my diary regularly, I recorded this data on paper:

IMG_0173

I thought about how to present it best. I wanted to show my daily grind and therefore kept it in the calendar format.

I started out making a dot for each day in a simple table format (Step 1) and then adjusted the number of days and numbered them to have a week-like format (Step 2). Sticking to standard practice: labeling 1, 8, 15 is of course counter-intuitive to a 7-day week format, I therefore changed the day labels to 7/14/21 (Step 3).

I then added the actual data: empty space for days without sports, a circle for days I did sports, and a cross for sick days (Step 4). Next, I started the graphic design part: decluttering wherever possible, playing with color and adjusting layout If necessary. For example, the table like grey boxes are not necessary (Step 5), and the lines separating the weeks are ugly, even in grey (Step 6)! Some guide is however needed to wade through the days. Gestalt principles show that white space is more effective in grouping than lines and boxes (Step 7). Using white space to separate the weeks made it necessary to then adjust the “no sports” data points from white to light grey.

Last was to add some more information, a summarizing bar chart showing percentage of days I did sports (not counting sick days), titles, axis labels, tick marks, and I the data of my alcohol consumption (for those months I tracked).

At last – and always at last only!  – I added color, and my favorite is blue. Voila!

 

************* UPDATE ****************

  • Holger commented that bars summarizing each month should be shown in same hue – they actually are, but with different opacity. I tried it without opacity.
  • Someone else wanted to see the months keyed to the weekdays, to check if I hate sports on Monday, and love it on Sundays. Sadly, no pattern emerges:

myYear4

How to win a conference prize!

Or, at least, produce nice posters while trying.

Students on average author 1-3 papers and produce at least three times that many conference posters*. At large meetings, such as the ASCB, thousands of posters are presented each year. While presenting posters is popular, posters sessions evoke mixed feelings: they are often late in the evening, interrupted by special workshops, held in badly lit rooms far away from the bar, and many posters are subpar: they are crammed with details and text in small font, and presenters elaborate in great detail. Experienced conference attendees therefore excel in the brief scanning of the title while avoiding eye contact with the presenter for fear of being entangled in a never-ending run-down of experimental details.

While we can’t influence the conference organization, we can absolutely and with little effort improve the posters! Based on my survey data, I compiled the top ten tips to improve your poster:

  1. Legible title

Make the title and your name readable from afar. This means, not too many words per title, maybe 6 to 10, in a legible font – Helvetica Neue, Verdana, Calibri or similar. Refrain also from All caps as it becomes hard to read after a few words. – If you love all caps, why not try Small Caps with capitalization instead.

  1. Avoid abbreviations

Ideally no abbreviations in the title and as few as possible in the poster content. Only few abbreviations are so common that they became words themselves: DNA, RNA, some gene and protein names. You don’t want to turn audience away with jargon, and remember, even specialist’s conferences are attended by editors, journalists, and newcomers in the field – be welcoming to them all!

  1. Not too much text

We read maximally 100-200 words per minute – but in posters, with scientific data, terms, and charts our reading speed will be significantly decreased. Keep that in mind – I personally am more convinced by a figure than by you explaining and interpreting it.

  1. Clear section layout

Start at the top left and end at the bottom right. This is how we read text, and also posters! Alternative: arrange your content in 2-3 columns, similar to an article – make sure the columns are clear by leaving enough white space surrounding them! Please refrain from unconventional layouts – the chances are high that it will confuse your readers!

  1. Figure titles instead of legends

This is easy – try moving the figure legend above the image/chart, instead of showing it below as you would in a paper. Right away, this gives you a header for that section! Explanations of the color code, which are critical to understand a figure, can be sub-headers!

legend_to_title-01

  1. Consistent color code

Absolutely keep the color code consistent across all figures! Nothing kills more time than figuring out the color code of each individual chart! Please, if your main experiment/mutant/condition is shown in “red” in the first figure, do not deviate from this in the next figure! And, of course, be color-blind friendly (no mixing red and green!)

Color_code

  1. Simple pictures and charts.

There is likely fascinating detail in your data, but not everyone wants to know all of it during a poster session. Therefore, please consider removing unnecessary details from your graphs! (Also: avoid 3D, no bar charts for distributions (#BarBarCharts) and avoid unconventional graph-types: it’s already unlikely people understand them in a paper, and less likely they feel like deciphering them in a poster session.

  1. Poster-Etiquette: Have the elevator speech ready!

Give your audience a polite overview in 2-3 minutes that includes the big picture and key finding, but leave out experimental details. If they are interested in more, they will ask! (Also, it is convenient to have this 2-minute blurb ready in case you accidentally bump into the heroine/hero of your field in the coffee line, instead of at the poster session!)

  1. Rehearse whenever you can!

Find 10 volunteers, not necessarily your supervisor only, to test out your 2-minute presentation; while in the lunch line, when waiting for a measurement to finish, or when cleaning the bench.

  1. Tricks are allowed.

To get people interested in your poster, you can use tricks. Have handouts ready to take home, bring a laptop to show movies, I’ve seen people hand out sweets, and know someone that served beers – everything is allowed when trying to convince people to read your poster!

 

Further reading:

A really nice paper on how to give a poster presentation is here: “Producing punchy posters” by Bernard S. Brown, in Trends in Cell biology, Vol. 6, 1996. He mainly deals with text, and less with figures, but has been helpful for me for 20 years!

 

* Unpublished results from survey, H.Jambor

 

Scales in scientific images

I recently saw drawings by Maria Sybilla Merian at Kupferstichkabinett Berlin and the University Library Dresden. Merian, who lived from 1647 to 1717, is renowned for her exceptional illustrations of biological specimens and gained recognition as a scientist for her nature observations, for example, of insect metamorphosis.

Maria Sibylla Merian (1647-1717) – “Das kleine Buch der Tropenwunder”, Insel Verlag, Leipzig Wiesbaden 1954, Public Domain, https://commons.wikimedia.org/w/index.php?curid=3319993

Merian evidently was genius in choosing frame and magnification in her drawings, but her pictures lack indications of scale*, which are essential in today’s science images. Scales give the reader the key for aligning the image content with reality. To my knowledge, neither Merian nor her predecessors from Antiquity, Byzantium, or Renaissance included scales in their medical and natural science images*. Even in the beginning of the 20th century, images were often considered a waste of space and scales unnecessary as scientists were familiar with each other’s apparatuses and objects. Today however we study invisible processes and structures that are unfamiliar to most of our colleagues and therefore have to include scales in our images.

Comment from Benjamin Moore in nature (1910) when reviewing a biochemistry handbook.

We often include in images a familiar object of a standard size for scale: a penny placed on a rock, a person standing beside a large animal or in a landscape, a measuring tape next to a fossil (or an Earth worm!).

Bar = 1cm (Earth worm lovingly raised by Jeff Woodruff).

Using familiar objects for scale isn’t possible for tiny things. We don’t have a clear mental image of the size of a salt grain or sesames seed to reliably use them to scale for instance cells**. We therefore include scale bars in microscopy images. With ImageJ/FIJI files from any microscope system can be read in along with their scaling information (shout-out to Curtis and Melissa and the Bio-Formats project!). By using Analyze > Tools > Scale Bar we can add the scale bar with a user-defined length, width, color, position, and label. Now the audience can calculate the actual size of objects and relate image with reality.

Four tips for superb scale bars

  • Length: Be kind to your audience and use simple units, such as 100um, 50um, 10 or 2um.
  • Color: Scale bars should have a high contrast with the background. Avoid red, green, or blue bars, as these colors might be considered part of the image.
  • Position: Lower left corner is a safe place. The upper space should be kept for important information like species, cell type, or gene name.
  • Add scale bar last: In the process of writing your manuscript you may re-think the figure size. Also images are re-sized for posters and slides. It is therefore easierst to add only a very fine scale bar with FIJI and then re-draw it in Adobe Illustrator (or PowerPoint, as I I know that about half of you out there use PowerPoint for making figures and posters!).

 

And finally, do not miss this article by Monica Zoppe with an interesting idea on how to communicate subcellular sclales better!

 

* I’d be delighted to stand corrected, and if you find old scientific images with scale bars, or interesting scales, send them my way for my collection!

** a great tool to update yourself in comparable scales in biology is here: http://learn.genetics.utah.edu/content/cells/scale/.

I never cease to be amazed at the relative size differences of cells and how they vary over so many magnitudes!

#BarBarPlots

Recently a kickstarter project raised more than 3000 EUR in one month to campaign for banning all wrong usage of bar plots in scientific journals. This demonstrates two important points: a lot of the plots in scientific journals are somewhat misleading, and a growing number of people feel very uneasy about this!

What exactly is wrong about bar plots? Nothing per se, but everything goes wrong if you use a bar plot for statistical data – this kind of plot species is also infamous as the “dynamite plot”. We are talking about the famous vertical or horizontal boxes that often come in a dazzling array of colors or patterns, with big fat black outlines and overly prominent error bars.

Dynamite_vs_DataPlot

Dynamite Plot                                                          Data Plot

 

Are they common? Very much so! My personal survey [i] of “dynamite plots” in scientific journals revealed that on average 30-60% of articles use them in journals covering a wide range of subjects that include physics, meteorology or psychology where authors typically have rigorous training in applied mathematics. The prevalence of dynamite plots increases as we go towards more life science journals, where 50- 70% of articles are accompanied by a dynamite plot showing a statistical summary [ii].

Most of us are completely accustomed to dynamite plots and happily use them, that is, until we see the light. From then on it is impossible to not hate them! Because it is so obvious they are misleading and make reading of the data just harder than necessary! And, as scientists, we aim for clarity and getting information across concisely!

The top reasons to avoid dynamite plots

  • They hide the real distribution of the data. Do all samples cluster closely? Do they form two groups? Or is there one drastic outlier? Generally, we assume a normal distribution of the data around the mean where there might not be one! In my survey of dynamite plots per journal they were more or less normally distributed.
  • They hide the sample size. From the bar plot you would not have known that I probed one issue of Nature, two issues of Cell and four issues of Development! But for judging scientific data knowledge of the sample size is essential for a proper evaluation of the data! Too often we have to search for the n in axis labeling, figure text, the results, or the methods section to finally find this information. And sometimes it is omitted entirely. A clear understanding of sample size in my opinion is also critical for the review process of a paper and should be demanded by the reviewers! Not showing data, or only showing summary data, should be treated equally to cropping Western blot bands!
  • Many different distributions of data can lead to the very Bar! See the Anscombe quartet. Bar plots are not intended to show statistic distributions, they are for absolute numbers. By plotting the real data we also learn more about the biology!

Not quite convinced? Seeing is believing, check out this figure:

tshirt_totebag

(c) Page Piccinini and the #barbarplots campain

For further information watch the video of the kickstarter campaign (British accent and humor alert!) – ideally with your entire lab and a discussion of this seminal paper on wrong usage of bar charts and this survey of their prevalence in biomedical journals!

Practical advice to avoid dynamite plots

  • Plot charts with statistical programing tool R. You have to either learn it, or be really nice to someone who knows it – if your PhD requires 3 boxplots, maybe invest in a friendly relationship with the bioinformatic geek in your department, a couple of coffees go a long way!
  • Learn how to make box plots in excel! (Here and here is how, but its a bit tedious).
  • Can’t be bothered to do either? Use one of the available web tools such as the boxplot maker from the Tyer’s lab or the plot generator from the University of Belgrade.

 

[i] I probed the top10-articles of Nature in July, the three most recent volumes of Science (August), four issues of Development (Vol 138, 1:3-2011 and Jan 2016), and two issues of Cell journal from 2016 (Jan and August). I was very relaxed and gave the benefit of doubt when I wasn’t sure. But I was rigorous when authors mixed right and wrong usage of bar plots. How does this even happen? Mix of co-authors and some know better than others?

[ii] Disclaimer: this does not mean the other articles have great figure design! I saw multiple uses of 3-dimensional pie charts, rainbow color schemes, other instances of unintentional usage of color, incomprehensible spider graphs and 3-dimensional heat maps! Maybe I will devote another blog post to those.

Color-blind people are your audience too!

This article is also on TheNode http://thenode.biologists.com/color-blind-audiences/photo/

Or, please stop mixing green/red

Color is a key aspect of graphic design, but for many years was not relevant for scientific figures that were largely black and white. Falling prices for color print and electronic publishing changed this dramatically and scientists now frequently produce multi-colored figures. Using color functionally is not always straightforward but few rules exist: do not combine red and green!

Already in 1939 Willard Brinton advised his readers to not use red letters on a green background as they become invisible to color-blind people (and are hideous for the rest of us!). [his great book on data visualization is available for free here]. A century later, when browsing through figures in scientific periodical, this message has not reached everyone.

In charts, it is very straightforward to avoid mixing red and green. If you want to use red, combine it with blue or cyan, if you want to use green, combine it with magenta or orange. That way also color blind people can distinguish the data points. A side note: try starting a chart in black and white, and only add color if absolutely essential.

In laser-microscopy green and red fluorophores are widely used, often in combination. But: Simply because a wavelength of your fluorophore is 488nm this does not mean you have to use green for its display! The camera output doesn’t have color anyway, so you are at liberty to choose a suitable lookup table. Why not be color-blind friendly and choose colors visible to your entire audience. Options that still preserve a little information on the wavelength are green/magenta or cyan/red. Again, consider if two black and white images instead of a composite color. In fact, the contrast is usually higher in greyscale which benefits the display of structure details and subtle intensity differences.

*Rm62 RNA in Drosophila egg chambers part of my postdoc project, find more subcellular RNAs on the Dresden Ovary Table.

Helpful tools:

  • Test color-blind visibility for your images here
  • Choose color for categorical, quantitative and diverging data in charts using color-brewer.

Comment suggesting more tools very welcome!