Hi,
I am working with 10X multiome data. I first built an scvi and a peakvi model on the RNA and ATAC portion separately to get a general sense of each modality. Both worked fine, but I also discovered that it was absolutely essential to include total read counts per cell as a covariate in the peakvi model, otherwise the UMAP was just a single thick curve with cells ordered by sequencing depth. However, when I tried to use both modalities at the same time and when I included total ATAC read counts per cell (and/or other, RNA-based QC metrics) as covariate(s) in the multivi model, the resulting UMAP was just a homogeneous round cloud. Not including any categorical covariates works, but I see pronounced sequencing depth gradients, which makes me think that I do need to include them as covariates. Could you please advise if there is something wrong with my strategy or whether I am not specifying covariates correctly?
Thanks!
scvi.model.MULTIVI.setup_anndata(adata,
layer = 'counts',
continuous_covariate_keys = [ 'total_ATAC_counts'])
my_model = scvi.model.MULTIVI(adata,
n_genes = (adata.var["modality"] == "Gene Expression").sum(),
n_regions = (adata.var["modality"] == "Peaks").sum())
my_model.train()
adata.obsm["X_multivi"] = my_model.get_latent_representation()
sc.pp.neighbors(adata,
use_rep = "X_multivi")
sc.tl.umap(adata,
spread = 2)
sc.pl.umap(adata,
color='total_ATAC_counts',
color_map='plasma_r',
size=2)