Tables, the ancient gold of DataViz.
Tables are one of the most successful visualizations in the history of science. They existed long before charts were invented. Miescher reported the discovery of nucleic acids in a small table with just three rows. Janni Nusslein-Volhardt and Eric Wieschaus compiled their observations that gene expression controls embryogenesis in a table. And the seminal discovery that cells are enclosed by a lipid bilayer warranted a full-page table:
Tables are still common in scientific papers and presentations today. In a recent issue of nature 70% of life science manuscripts incorporated some form table. Simple tables are often relegated to the supplemental section. Tables in the main section of a manuscript may also be tables in a fancier format, such as a heatmaps, HiC plots, or databases. This common use of tables explains the high interest in the subject among participants of my DataViz classes.
Data suitable for tables
When seeing a figure, we focus on its most prominent features: thick lines, tallest bars, patterns in a scatter plot. If the figure is done well, this provides us instant insight to the main message. Tables display text and numbers in an organized form. When seeing a table, we address it like a text: we read from top left, to bottom right. And this is the intention of a table: they present complete datasets without a punchy, key message. Tables force the reader to come up with a conclusion and thus are more work. And tables quite simply require more space than any summary chart.
Despite these disadvantages, tables very useful for reporting precise numbers, and for precisely comparing numbers across rows and columns. Tables are also great for presenting large datasets, in which every member of our audience is interested in a different aspect of the data. Each reader of the table on ERC starting grants likely will their country of residence first. If however I wanted to show which country gets most ERCs, a bar chart works better (for categorial data, not statistical summaries). One can immediately spot the longest bar, and read it even faster when data is sorted.
Designing a clear table
Because tables are organized text, alignment and typography are critical for their legibility, and good legibility allows faster reading.
Organize rows and columns.
Which data is used as the key for presenting the result? This goes to the first column of every row. In our case these are the country names. All observations for this country, ERC grants, population, funding rate, get a column each.
Alignment is your friend.
Text is best left aligned, where we start reading text. Numbers are right aligned to make comparisons by digits along a column easier. The column headers are aligned with the content. That means the header for a text column is left aligned, the header for a number column right aligned.
Choosing a legible font is always great, but really critical for numbers. To be legible, all numbers should have the same height (“new style”) and the same width (“tabular/monospace”). The width is important to compare digits within columns, the same height (so no ascenders and descenders in numbers) looks overall less cluttered.
It is not possible to make easy recommendation for a specific font since they are modified depending on the operating system and program. For example, Arial has proportional characters which would not work for numbers in a table. Microsoft Word/Mac has adapted the numbers to be tabularized, while Adobe illustrator/Mac has not.
To best compare numbers in a column, they should have the same decimal points. This helps the alignment and the understanding. In general, it is always worth considering if the decimals are even important: number of grants does not warrant a decimal because it can only be a whole number. Same for humans, but humans in millions might be reasonably presented with one decimal.
With the above rules, you usually end up with an organized and legible table. But good news is, you can do more! Removing left and right boarders is fantastic to create extra space and use the entire width of a table cell for the content – sometimes rather important for grants!
You can do still more. The minimalist Edward Tufte even says ‘every pixel should have a meaning’. In this spirit, removing all gridlines works often just as fine. Often the well-aligned content is sufficient to guide the eye through columns and rows.
Very long tables however are hard to read without any guides. Two options exist. Either, the content may be grouped into blocks of 5 or 10 rows each. The blocks are then visually separated by white space from each other. Alternatively, the table could include a horizontal gridline every 5 or 10 rows.
Last, and only last, think also about color (black and grey are colors too!). I personally like highlighting table headers by giving them a fill color. Usually a light grey works fine, but this very much depends on the overall presentation and the purpose. If shown on a beamer, some light greys don’t work anymore. Also think of color code (and never mix red and green). I consistently used pink, so using the same color for the header would maybe give it a coordinated look. If your fill color is dark, you have to think about white labels. And they in turn requires larger fonts to achieve equal legibility.
Heatmap. A heatmap is a matrix/table in which the cells are shaded according to a color-scale representing the an observed value. They are particularity used to represent many-to-many comparison. Heatmaps can display very large matrices in at a very small scale and allow us to rapidly compare numbers and even see coherent patterns in the data. Heatmaps may also be combined with clustering algorithms (or simple sorting by value), which facilitates seeing patterns in data. Heatmaps are not useful to get precise numbers.
Microarrays. A heatmap that informs about gene expression levels across samples. Gene expression is shown as relative expression compared to a ground truth state. Up-regulation is shown in green, downregulation in red (microarrays are thus not color-blind safe).
HiC plot. HiC plots show heat maps where each pixel represents counts for DNA interactions between two genomic regions. The pixel intensity indicates the number of reads (one color scale) or the divergence of reads from a control (dual color scale). The axes each show the genomic regions that are compared, usually binned to e.g. 1Mb.
Database: online formats of tables to present a large dataset.
Table-Chart hybrid: A table with several observations in columns and one of the observations being presented as small chart (dotplot, boxplot, barplot) adjacent to the respective row. The chart-column usually highlights a particularly important observation.
* and Herzegovina