I am working with 10X multiome data. I first built an scvi and a peakvi model on the RNA and ATAC portion separately to get a general sense of each modality. Both worked fine, but I also discovered that it was absolutely essential to include total read counts per cell as a covariate in the peakvi model, otherwise the UMAP was just a single thick curve with cells ordered by sequencing depth. However, when I tried to use both modalities at the same time and when I included total ATAC read counts per cell (and/or other, RNA-based QC metrics) as covariate(s) in the multivi model, the resulting UMAP was just a homogeneous round cloud. Not including any categorical covariates works, but I see pronounced sequencing depth gradients, which makes me think that I do need to include them as covariates. Could you please advise if there is something wrong with my strategy or whether I am not specifying covariates correctly?
scvi.model.MULTIVI.setup_anndata(adata, layer = 'counts', continuous_covariate_keys = [ 'total_ATAC_counts']) my_model = scvi.model.MULTIVI(adata, n_genes = (adata.var["modality"] == "Gene Expression").sum(), n_regions = (adata.var["modality"] == "Peaks").sum()) my_model.train() adata.obsm["X_multivi"] = my_model.get_latent_representation() sc.pp.neighbors(adata, use_rep = "X_multivi") sc.tl.umap(adata, spread = 2) sc.pl.umap(adata, color='total_ATAC_counts', color_map='plasma_r', size=2)