Differences in library sizes between reference and query

emdann · July 19, 2024, 9:04pm

Is anyone aware of benchmarks that have tested how large differences in library size (i.e. mean total counts per cell) affect integration with scVI and query mapping with scArches? I am working with an scVI model trained on a reference with significantly lower counts per cell compared to the query (where reference mean total count ~ 1000, query mean total count ~ 5000). After query mapping, I see a relationship between total counts for a cell and similarity to the reference. I’m curious to see if other people have looked at this, before going into doing downsampling experiments, since in my case a bunch of other biological factors are correlated with total counts per cell.

From the user guide I get that the scVI model default is to use sum of counts for library size

the recent default for scVI is to treat library size as observed, equal to the total RNA UMI count of a cell.

Is training the reference model with use_observed_lib_size=False likely to make a difference here?

Thanks a lot!

cane11 · July 20, 2024, 6:59am

Hi, it’s usually not a good idea in my hands. Purely for the latent space, it might be fine but I wouldn’t trust downstream methods like get_normalized_counts or differential expression. In general, I would suggest training from scratch and check whether the same structure shows up. It doesn’t necessarily mean that you are interested in this feature and additionally adding total_counts as continuous covariate key might then help.

emdann · July 22, 2024, 3:24pm

Hi @cane11, thanks for the tips. By training from scratch do you mean training an scVI model on concatenated reference and query, instead of using query mapping?

cane11 · July 22, 2024, 3:55pm

Hi Emma, yes after concatenation or just use the query data if you are not interested in the reference data.

Topic		Replies	Views
scVI imputation confusion scvi-tools scvi , imputation	1	771	June 16, 2021
Input peak counts vs binarized scvi-tools peakvi	1	480	December 22, 2021
Label transfer from CITE-seq CITE-seq scanvi , scarches	5	489	September 16, 2022
ScArches-TotalVI reproducibility scArches integration , totalvi	15	800	May 19, 2022
Smartseq data prep for SCVI scvi-tools scvi , preprocessing	3	661	December 10, 2022

Differences in library sizes between reference and query

Related topics