Visium normalization best practices

grst · November 14, 2023, 7:59am

Are there any recommendations on normalizing spatial transcriptomics data before visualization? In the spatial scanpy tutorial, the gene expression is normalized like scRNA-seq data using normalize_total + log1p. In the squidpy visium tutorial, on the other hand, raw counts are plotted.

Personally I’m not convinced that normalize_total makes sense for spatial data, as

I’d assume there is less technical variability between spots than between droplets.
There are biological reasons for different spatial regions having different mRNA content (see also the example below).

But there should probably still be some normalization when comparing between samples. I’m wondering if something like scran makes sense here? As far as I understood scran takes different mRNA content of cells is taken into account when computing scaling factors. Alternatively maybe just normalizing at the sample level (e.g. normalize each sample to 10k * n_spots_under_tissue) is appropriate?

normalize total in a tumor sample

In my experience normalize_total seems problematic at least for the tumor slides I have, because different regions of the slide have vastely different mRNA contents. For instance, the tumor region has a much higher count density than the stromal or immune regions.

Fig1: log1p total counts

Fig2: spatial niches

When plotting the normalized gene expression, the normalized values appear to be much higher (although sparser) in the immune region, which is misleading:

Fig3: KRAS expression, log-normalized

Here is the same plot with just log1p transformed counts:

Fig4: KRAS expression, log1p-transformed

pakiessling · November 20, 2023, 5:52pm

Hi @grst ,

I was wondering the same thing.
While most publications do normalize_total this also seems a bit strange to me.
I saw Seurat’s sctransform and stLearn’s SME recommended here as they do not assume that every spot / cell should contain the same count.

Not sure how to best evaluate what works well and what doesn’t though.

cane11 · November 28, 2023, 6:28pm

The normalization from Cell2location using detection efficiency behaves well in my hands. For pure display reasons and if you don’t want to train something, another round of count normalization after logarithm is helpful in my hands as proposed by Lior Pachter (don’t remember where).

grst · June 5, 2024, 6:40am

Hi @cane11,

Thanks for the suggestion! Where do you get that from?
Is this

X / y_s

where X is the raw UMI counts and y_s is the “a location/observation-specific scaling parameter which adjusts for differences in sensitivity between observations and batches” obtained using

mod.samples["post_sample_means"]["detection_y_s"]

?

cane11 · June 11, 2024, 6:02pm

Hi Gregor, this sounds correct. However, I was speaking about the per-celltype expression after deconvolution (described in the NCEM part of Cell2location - Mapping human lymph node cell types to 10X Visium with Cell2location — cell2location documentation. This one still works, if there are sever differences in library size along the chip. For CELLxGENE session and similar, I had the easiest visual time doing normalize_total->log1p->normalize_total (called PFlog1pPF in https://www.biorxiv.org/content/10.1101/2022.05.06.490859v1.full.pdf).

cane11 · June 11, 2024, 6:09pm

I don’t think it’s discussed somewhere, but:

    def compute_expected(self, samples, adata_manager, ind_x=None):
        r"""Compute expected expression of each gene in each location. Useful for evaluating how well
        the model learned expression pattern of all genes in the data.
        """
        if ind_x is None:
            ind_x = np.arange(adata_manager.adata.n_obs).astype(int)
        else:
            ind_x = ind_x.astype(int)
        obs2sample = adata_manager.get_from_registry(REGISTRY_KEYS.BATCH_KEY)
        obs2sample = pd.get_dummies(obs2sample.flatten()).values[ind_x, :]
        mu = (
            np.dot(samples["w_sf"][ind_x, :], self.cell_state_mat.T) * samples["m_g"]
            + np.dot(obs2sample, samples["s_g_gene_add"])
        ) * samples["detection_y_s"][ind_x, :]
        alpha = np.dot(obs2sample, 1 / np.power(samples["alpha_g_inverse"], 2))

        return {"mu": mu, "alpha": alpha, "ind_x": ind_x}

removing samples["detection_y_s"][ind_x, :] from this code would remove the detection efficiency and generate normalized data. Similarly normalizing to the same samples["detection_y_s"][ind_x, :] would give you normalized counts.

Topic		Replies	Views
Scanpy cell-based normalization Help	0	328	April 26, 2023
Normalization on rpkm data scanpy	0	239	September 11, 2023
Scale by total protein counts before running totalVI (to eliminate a batch effect of having different sequencing depths) scvi-tools	3	127	May 3, 2024
Data fomr new spatial transcriptomics from 10x squidpy	5	1054	March 8, 2023
Which values do you use to visualise gene expression? scvi-tools	3	1258	February 21, 2023

Visium normalization best practices

normalize total in a tumor sample

Related topics