Nonsense UMAP when including categorical covariates in a MULTIVI model


I am working with 10X multiome data. I first built an scvi and a peakvi model on the RNA and ATAC portion separately to get a general sense of each modality. Both worked fine, but I also discovered that it was absolutely essential to include total read counts per cell as a covariate in the peakvi model, otherwise the UMAP was just a single thick curve with cells ordered by sequencing depth. However, when I tried to use both modalities at the same time and when I included total ATAC read counts per cell (and/or other, RNA-based QC metrics) as covariate(s) in the multivi model, the resulting UMAP was just a homogeneous round cloud. Not including any categorical covariates works, but I see pronounced sequencing depth gradients, which makes me think that I do need to include them as covariates. Could you please advise if there is something wrong with my strategy or whether I am not specifying covariates correctly?


                                 layer = 'counts',
                                 continuous_covariate_keys = [ 'total_ATAC_counts'])

my_model = scvi.model.MULTIVI(adata,
                              n_genes = (adata.var["modality"] == "Gene Expression").sum(),
                              n_regions = (adata.var["modality"] == "Peaks").sum())


adata.obsm["X_multivi"] = my_model.get_latent_representation()

                use_rep = "X_multivi"),
           spread = 2),

Hi, would you be able to include additional details such as scVI, PeakVI, and MultiVI UMAPs, as well as possibly plots for training and validation loss?