Are there any recommendations on normalizing spatial transcriptomics data before visualization? In the spatial scanpy tutorial, the gene expression is normalized like scRNA-seq data using normalize_total
+ log1p
. In the squidpy visium tutorial, on the other hand, raw counts are plotted.
Personally I’m not convinced that normalize_total
makes sense for spatial data, as
- I’d assume there is less technical variability between spots than between droplets.
- There are biological reasons for different spatial regions having different mRNA content (see also the example below).
But there should probably still be some normalization when comparing between samples. I’m wondering if something like scran makes sense here? As far as I understood scran takes different mRNA content of cells is taken into account when computing scaling factors. Alternatively maybe just normalizing at the sample level (e.g. normalize each sample to 10k * n_spots_under_tissue) is appropriate?
normalize total in a tumor sample
In my experience normalize_total
seems problematic at least for the tumor slides I have, because different regions of the slide have vastely different mRNA contents. For instance, the tumor region has a much higher count density than the stromal or immune regions.
Fig1: log1p total counts
Fig2: spatial niches
When plotting the normalized gene expression, the normalized values appear to be much higher (although sparser) in the immune region, which is misleading:
Fig3: KRAS expression, log-normalized
Here is the same plot with just log1p transformed counts:
Fig4: KRAS expression, log1p-transformed