Smartseq data prep for SCVI

Hi all,

Does anyone have any advice on how best to prepare raw smartseq cellxgene matrix for SCVI?

Although the tutorials use a lot of smartseq datasets, the preprocessing of such datasets is not mentioned.

Main questions are:

  • Should I normalise the data prior to use? e.g. norm for gene length, TPM etc.

Any help would be much appreciated.

Here we normalize the counts to the median gene length and round to integers.

Thanks Adam,
I had missed this. Having tried this now, I was wondering if I could pick your brains about using scArches across assays.

Having reference mapped 2 smart-seq datasets to a core atlas of 10X data, I am noticing that smarteq data does not appear to map well (none of the cells are integrated).

Have you ever noticed this? I am wondering whether this is to do with the lack of a smart-seq dataset in the core atlas.

Any thoughts are much appreciated

Hmmm, I would assume that global library size after the scaling might still be an issue. I would double check if they did anything else in the scIB paper as well.

To be honest, I haven’t done much reference mapping across technologies myself.