Smartseq data prep for SCVI

Nusob888 · December 6, 2022, 11:25am

Hi all,

Does anyone have any advice on how best to prepare raw smartseq cellxgene matrix for SCVI?

Although the tutorials use a lot of smartseq datasets, the preprocessing of such datasets is not mentioned.

Main questions are:

Should I normalise the data prior to use? e.g. norm for gene length, TPM etc.

Any help would be much appreciated.

adamgayoso · December 7, 2022, 6:21am

Here we normalize the counts to the median gene length and round to integers.

Nusob888 · December 9, 2022, 12:09am

Thanks Adam,
I had missed this. Having tried this now, I was wondering if I could pick your brains about using scArches across assays.

Having reference mapped 2 smart-seq datasets to a core atlas of 10X data, I am noticing that smarteq data does not appear to map well (none of the cells are integrated).

Have you ever noticed this? I am wondering whether this is to do with the lack of a smart-seq dataset in the core atlas.

Any thoughts are much appreciated

adamgayoso · December 10, 2022, 2:56am

Hmmm, I would assume that global library size after the scaling might still be an issue. I would double check if they did anything else in the scIB paper as well.

To be honest, I haven’t done much reference mapping across technologies myself.

Topic		Replies	Views
Integration and Normalize with Smartseq2 matrix by gene length (human) scvi-tools	2	578	February 7, 2022
Understanding scVI integration inside R with Seurat v5 & SCTransform scvi-tools integration	1	426	April 6, 2025
Thoughts on a more ~realistic tutorial? scvi-tools tutorials	14	1510	February 26, 2022
Predict cell type with scANVI for spatial transcriptomics data (Xenium) scvi-tools integration , scanvi , scvi	7	249	December 28, 2025
Add new data to existing integration scvi-tools	3	169	March 7, 2025

Smartseq data prep for SCVI

Related topics