Science visualization trends of 2022

It’s that time – top 10 lists of the past year! And since my domain is science, I once more reviewed research data visualizations. Here are a couple of trends that I noticed on science twitter and in scientific journals throughout the year 2022. Disclaimer, this is not a survey but rather a personal collection for you to enjoy with me!

#1 Color schemes are literally all over the color wheel!

Colors can encode sequential data. In life sciences, Jet/Rainbow were prevalent but are less common as their pitfalls became known (new colormaps). Instead Viridis became the default color scheme in Python, but since then more and more color schemes have been invented. It now requires attention to decide if color encodes increasing, decreasing, divergent or continuous values. As a further complication, sometimes different color schemes co-exist – I once counted 5 different color schemes in one publication alone. (And, at times, legends are entirely missing, making interpretation impossible!).

— update: see also the color analysis by Xan Gregg, he compared color schemes used in parallel in the Fourth National Climate Assessment. https://rawdatastudies.com/2023/01/07/climate-assessment-color-gradients/.

#2 – Tufte’s grey shades at last!

Speaking of color, it seems more people have read their Tufte books! At least his command to use gray instead of black for background chart elements (tick marks, axes, control data etc) was one of the biggest lessons I learned from him. I see gray becoming quickly popular to de-emphasize data that is not the focus – in line plots, tSNE plots and for indicating treatment regiments in time-courses.

#3 – Hybrid chart-tables

This was a surprise – tables have been popular in genomics as heat map for many years. This year however I saw a huge number of hybrid “table + area charts” combo’s! In these the numbers are not encoded by color, but by bubble size. And, to take things further, this is combined with color-coded bubbles to gain a fourth dimension (whether that is always understandable is another debate).

#4 – Genomic figures get smaller and smaller…

Fetch your magnifying glasses, genomic data plots get smaller and smaller as more and more data are squeezed into each panel and figure. The small plot in b literally has thousands of data points and all it needs to show is: there are clusters.

Maybe we should at some point ask ourselves “does this need a graphic?” or perhaps we instead revert back to summary statistics. Speaking of statistics…

#5 – Error bar? Nah, “T” suffices – the most iconic mishap of 2022!

The most iconic mishap of the year in scientific publications surely was the study that could not be bothered to plot error bars and instead simply placed the letter “T” on top of a bar chart! Needless to say, all bars had the same letter point size and thus identical error ranges. As the figure made the rounds on social media, the journal actually did care and swiftly retracted the publication.

The authors did not enlighten us on how they actually prepared this creative solution, but of course only a Christmas break later we now have a R package for the … Terror-bar! Do not use at home folks! Check out the hilarious package description by Milan Valášek: https://mival.netlify.app/blog/2023/01/introducing-geom_terrorbar/

#6 – Axis breaks…..

The good news first, bar charts for summary statistics have mostly been eradicated (in high impact journals at least…). Fewer bar chart should also mean fewer axis breaks, right? Wrong! Instead, it means more inventive axis breaks! First, bar charts for quantities are still popular, with and without axis breaks – and sometimes two breaks! Secondly, a lot more worrying, scientists (n=1) invented a way to also break the axis of dot plots – let’s hope this does NOT become a new trend!

#7 – Icon libraries for graphical abstracts

Oh yes, graphical abstracts keep surging in popularity, in fact I get a ton of requests to help with those. And I like making them, it’s fun and I can play around with new ideas a bit! Thanks to Biorender and its large icon library, making graphical abstracts has become a piece of cake for all scientists, but it comes with a hefty price tag.

Here are a couple of free icon libraries:

#8 – Climate stripes are still omnipresent

This year was again, the warmest recorded ever, making the climate change really the biggest topic at least on my radar. While only symbolic, I did give up traveling by plane for short distances and have now post-Covid explored the night train options!

The iconic chart displaying the science behind the climate analysis is the warming stripes by Ed Hawkins from the University of Reading. This year they were used as cover for Greta Thunberg book, on train stations, as fashion art and more. And, we have also seen new versions illustrating that along with increasing temperatures comes a loss of biodiversity.

#9 – Images are big

2022 was once again a year of images – new telescopes imaged far away stars and biologists deep into the angstrom scale. But AI is really starting to shape our daily life. From AI-assisted image generation tools such as DALL-E that kept all of us captivated, to new AI-assisted image analysis there was something for everyone. In the hospital I am most impressed how we are reaching a stage where clinical images can indeed be screened with robots faster and better than by medical staff!

Some good news: scale bars in images are a thing (see our survey on scale bars) and in personal news, our QUAREP initiative will publish our community developed guidelines for image publishing soon, stay tuned.  

#10 – Cutie of the year: fluffy t-SNE plots

On trend with fluffy hats and fluffy jackets, also t-SNE plots became fluffy in 2022!

Can’t wait to see what 2023 will bring!

Best wishes,

Helena

Advertisement

Survival plots – Medical charts 1

Not too long ago, I transitioned from molecular biology to clinical oncology research. In the hospital, I adapted to a different pace, visible hierarchies, and learned patience as patient care naturally takes precedence over research! I also got used to new data science environments with specific requirements for documentation, privacy, and ethics. And of course, I learned about a number of new visualizations and plots that are common in clinical data reporting! I will introduce a few of these in this article-series.

Anatomy of a survival plot

A common chart type in clinical oncology is the ‘survival’-plot, also known as Kaplan-Meier plot. However, neither name appears in your standard chart guide or references (https://r-graph-gallery.com/ or https://datavizcatalogue.com/home_list.html). Survival plots generally looks like a line plot, with the time shown on the x-axis. The time range depends on the clinical trial and its defined endpoints – and may be anywhere from minutes or days to years. Listed time points are the follow-up appointments and as such neither in regular intervals nor evenly spread.

The y-axis

The tricky bit is of course the y-axis. The data points do not encode directly measured values but instead a ‘survival probability’ at the given time point. Naturally the ‘survival probability’ decrease over the study time for any cohort. As time passes, more and more patients will invariably experience an event that was previously set as the study end-point. Study end-points typically are survival, hence the name of the plot, but they may also be recurrence of a cancer or also a positive event like leaving the intensive unit.

The survival probability is then re-calculated only for those participants still enrolled in the study at the given time point. Towards the very end of the curve, when only one or two patients are still observed, the curves most drastically change, this however reflects only relative large effect one event may have on a smaller set of study participants.  

The categories

Survival plot most often compare survival probabilities for two conditions. These could be how patients survive under a new care regime compared to the standard of care or two alternative medications. But survival plots can also be used to illustrate different response groups such as male and female study participants, or patients stratified by age.

Often the confidence interval is plotted along with the survival probability for each category and can help to gauge the uncertainty. This is especially important towards the end of the curve when fewer participants mean also increasingly larger uncertainty of the data.

Additional decorations

Survival plots frequently label the time with 50% survival probability for each cohort and often include a table below the plot explicitly listing the patients at risk for specific time points. 

Patients also drop out of a clinical study before the observed event. The participant is then excluded from the data analysis, which is termed censoring. They may die, get well and leave a trial, or stop participating in the study. This reduces the pool of persons at risk, even if no patient reached the endpoint. In the survival plot these persons are often marked in the respective category line with a tick-mark.

Variations

The very same data and survival probabilities can of course also be plotted differently. For instance, instead of focusing on the entire range of the data, a zoomed view of the early time points may help to understand critical differences in treatment. At these time points the CI is still very low and differences likely to be meaningful.

There are also versions that flip the y-axis. Now instead of showing the survival probability, the plot focuses on how along the observed time course more and more events occur and sums them up (cumulative events) or indicates the likelihood of the event (cumulative hazard).

References & Try it out

Upcoming articles:

  • Medical charts 2: Trial/study diagrams
  • Medical charts 3: Forrest plots
  • Medical charts 4: Common pitfalls

30+ charts for #30DayChartChallenge

The #30DayChartChallenge is an annual festival for chart-lovers around the world. There is a daily visualization challenge and participants post their own chart solutions. Datasets, data analysis, tools, and presentation is entirely up to the contributor. All entries are posted on twitter along with the #Day1 to #Day30 hashtag. The resulting diversity and personal interpretations are the charm of the challenge and a joy to data scientists and visualizers alike.

  • Charts showing aspects of Ukraine life

My 30DayChartChallenge

I personally was looking for an excuse to get back to making charts with R. All I could think of at the time was however the war in Ukraine. My granddad Anton Jámbor was born and raised in Khust, Zakarpattia Oblast, which was another reason to finally educate myself about Ukraine. So, I made a plan and almost completed it! I researched the people and population of Ukraine, the country and cities, and a little bit the health data since my day-job is in medical research at the University hospital.

From idea to implementation:

Moving along: the daily charts

Over the course of April I completed 18 of the 30 challenges. Many times I made several solutions and tested many more. Below is the example for Kazimir Malevich. I plotted the color-schemes he used in his paintings over his lifetime and tried our several options including an animation.

Finding time each day for a full month was impossible with Easter vacation and care work requiting my attention. Now, in a calm summer, I finally completed five more challenges for which I had already found data. I posted these final entries on Ukraine’s Independence Day, coinciding with 6 months since the war starts.

Kazimir Malevich was born in Ukraine and is famous for monochrome,abstract paintings. However, that was a brief period, before and after his paintings are full of colors! I analyzed his color-usage over his lifetime. This was one of the few datasets I created myself.

Design objective

My goal was to create a visually and thematically cohesive work. I plan to use my charts for teaching purposes and for teaching the many ways how we can communicate data with charts. Most of my time, as always, was spend on finding a topic for which a dataset was available. A lot of time was also consumed by data wrangling and getting it in a format to actually make charts. And only a fleeting moment for creating the visualizations. That’s how it goes!

Enjoy, and Slava Ukraini!

Keep Ukraine in the news!
Animation of google searches for Zelensky, Putin and Russia

Conformation of the insulin receptor

A few days back, my fellow CNV grantee Theresia Gutmann from the Coskun lab casually told me over dinner about her PhD work. In collaboration with the Rockefeller University NYC, Theresia had visualized the changing conformation of the human insulin receptor upon insulin binding (paper). Having just started at the Center for Regenerative Therapies Dresden with its focus on Diabetes, I could not believe that this had not been done before! To honor her achievement, I made a #sketchnote of the discovery and a GIF explaining insulin in our body (below).

theresia_new.pngInsulin:insulin_6

Paper: Gutmann, Kim et al. (2018): Visualization of ligand-induced transmembrane signaling in the full-length human insulin receptor. Journal of Cell Biology, DOI: 10.1083/jcb.201711047