Which scvi-tools releases support modelling with extra covariates?

pseudonym2 · August 27, 2021, 8:31pm

Dear scVI-developers and community,

I stumbled over the following github issue:

github.com/YosefLab/scvi-tools

Training VAE model using continuous covariates returns numpy error

opened 11:31AM - 11 Dec 20 UTC

closed 09:29PM - 21 Jan 21 UTC

isaacg322

Hello, I'm trying to run totalVI with my citeseq data. This consists of 5 batche…s for which I would like to consider the total number of features detected as an additional categorical covariate. During the scvi setup, I supply the key to this column in the sc.obs as: scvi.data.setup_anndata(andanta_sc, batch_key="sample", continuous_covariate_keys=["nFeature_RNA"], protein_expression_obsm_key="protein_expression", layer="counts") Everything runs to the above point. However, when I run the vae setup command : scvi.model.TOTALVI(andanta_sc, use_cuda=False, latent_distribution="normal") I get the error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/jovyan/my-conda-envs/scvi-env/lib/python3.7/site-packages/scvi/model/totalvi.py", line 94, in __init__ super(TOTALVI, self).__init__(adata, use_cuda=use_cuda) File "/home/jovyan/my-conda-envs/scvi-env/lib/python3.7/site-packages/scvi/core/models/base.py", line 34, in __init__ self._validate_anndata(adata, copy_if_view=False) File "/home/jovyan/my-conda-envs/scvi-env/lib/python3.7/site-packages/scvi/model/totalvi.py", line 1028, in _validate_anndata adata = super()._validate_anndata(adata, copy_if_view) File "/home/jovyan/my-conda-envs/scvi-env/lib/python3.7/site-packages/scvi/core/models/base.py", line 94, in _validate_anndata _check_anndata_setup_equivalence(self.scvi_setup_dict_, adata) File "/home/jovyan/my-conda-envs/scvi-env/lib/python3.7/site-packages/scvi/data/_utils.py", line 194, in _check_anndata_setup_equivalence if not _scvi_dict["extra_continuous_keys"].equals(target_cont_keys): AttributeError: 'numpy.ndarray' object has no attribute 'equals' I guess that this has to do with the processing of the covariate keys, as if I remove the "continuous_covariate_keys" from the scvi setup, then I'm able to setup the vae model. Would you have any recommendations to solve the issue? I'm running in Python 3.7.9, linux, using the environment recommended in the installation section from the SCVI-tools documentation at https://www.scvi-tools.org/en/stable/installation.html.

As I myself am very interested in modelling with extra categorical and continuous covariates that can be passed to setup_anndata, I am wondering since which release is the correction of the latent space actually implemented (and not just passing of covariates to the anndata_setup that won’t be further used). I understand it should be the case for the latest release, but what about older releases (e. g. 0.8, etc.)?

Is is this different for categorical and continuous covariates?

Thanks for providing this great tool and continuously improving it.

adamgayoso · August 30, 2021, 5:35pm

This was implemented in v0.9.0. See release notes here. Before v0.9.0, only the batch_key parameter worked.

pseudonym2 · September 20, 2021, 2:18pm

Dear Adam,

thanks a lot, this is what I was looking for.

pseudonym2 · September 27, 2021, 7:48pm

Hi,

now I changed to the latest version of scvi to make use of extra covariates. However, I noticed that lots of the user interface changed and I encountered a strange thing while training my models.

I have rather a large dataset of > 100k cells, which I previously trained with:
model.train(n_epochs = 5, n_iter_kl_warmup=1600, n_epochs_kl_warmup=None, frequency=1)

This converged very quickly in v0.8.1.
Now I am training a similar set of cells in v0.13.0 with:

model.train(max_epochs = 5, early_stopping = True, check_val_every_n_epoch = 1)

However, this doesn’t converge as quickly, in fact even after 50 epochs I didn’t reach convergence although the model is learning. This was also independet of covariate usage etc.

Am I missing something?

adamgayoso · September 27, 2021, 8:13pm

can you please post full code for both versions? We’d like to know which model you are using.

Furthermore, how are you measuring convergence?

pseudonym2 · September 29, 2021, 10:13am

Dear Adam,
This is the relevant code for both versions, but I left out the data loading part. The dataset size is ~550k cells. Please also see the elbo plots. In this regard, I may have used the term convergence not in a very technical sense. I was just referring to “no substantial further loss reduction per epoch”. Clearly, the model parameters are different, as some of the keywords from the old version are not available in the latest version. Please let me know whether you need any further information regarding versions of any side-packages.

I have to say the resulting umap plots (not part of the code here) are very comparable in quality which is easy to judge as the dataset is only pbmc which have a well known composition.

v0.13.0

#normalize data
sc.pp.normalize_total(adata, target_sum=1e4, exclude_highly_expressed = True)
sc.pp.log1p(adata)
adata.raw = adata

#find HVGs
sc.pp.highly_variable_genes(adata, n_top_genes=2000, subset=True, flavor="cell_ranger", batch_key="orig.ident")

#setup anndata
scvi.data.setup_anndata(adata, batch_key="orig.ident", layer= 'counts')

#train model
model = scvi.model.SCVI(adata)
model.train(max_epochs = 500, early_stopping = True)

#plot training history
train_elbo = model.history['elbo_train'][1:]
test_elbo = model.history['elbo_validation']
ax = train_elbo.plot()
test_elbo.plot(ax=ax)

v0.8.1:

#normalize data
sc.pp.normalize_total(adata, target_sum=1e4, exclude_highly_expressed = True)
sc.pp.log1p(adata)
adata.raw = adata

#find HVGs
sc.pp.highly_variable_genes(adata, n_top_genes=2000, subset=True, flavor="cell_ranger", batch_key="orig.ident")

#setup anndata
scvi.data.setup_anndata(adata, batch_key="orig.ident" , layer= 'counts')

#train model
model = scvi.model.SCVI(adata)
model.train(n_epochs = 3, n_iter_kl_warmup=1600, n_epochs_kl_warmup=None, frequency=1)

#plot training history
train_test_results = pd.DataFrame(model.trainer.history).rename(columns={'elbo_train_set':'Train', 'elbo_test_set':'Test'})
print(train_test_results)
ax = train_test_results.plot()
ax.set_xlabel("Epoch")
ax.set_ylabel("Error")
plt.show()

adamgayoso · September 29, 2021, 4:05pm

A few things:

Could you provide the actual history dataframe values? The scale of these plots is quite different.
The way you run the latest version is not the same as before. You need to add plan_kwargs=dict(n_steps_kl_warmup=1200, n_epochs_kl_warmup=None)

pseudonym2 · October 21, 2021, 8:06am

I think the main problem was with the different scales. Sorry for bothering and thanks.

Topic		Replies	Views
Nested batch effects with scvi scvi-tools	3	375	November 29, 2023
Error when trying to use scvi.model.SCANVI.from_scvi_model scvi-tools	7	227	July 22, 2025
Integration with or without covariate scvi-tools	1	78	November 11, 2024
Tensor nan error when training scANVI model Help scanvi , scvi	6	764	January 29, 2025
totalVI NaN loss with few proteins scvi-tools	7	1065	August 3, 2022

Which scvi-tools releases support modelling with extra covariates?

Related topics