Is it possible to add extra covariate categories to a trained SCVI model?

I execute the following steps to predict cell types in a query dataset using a reference dataset:

  1. Train an SCVI model on a reference dataset with categorical covariates.
  2. Concatenate query and reference datasets.
  3. Create a SCANVY model using the concatenated dataset and the SCVI model above.
  4. Train the SCANVY model.

During the creation of the SCANVY model I get an error saying that the covariate categories of the query dataset are different from that of the reference.

The only workaround to this problem that I’ve found is to add the query categories to the reference adata object before training the reference. But it does not seem like a sustainable solution as I will need to retrain the SCVI model for every new query dataset.

Is there any way to add extra covariate categories to an already trained SCVI model?

Hi, we added support for scArches and categorical covariates in 1.2 (extending categorical covariates). However, the from_scvi_model function doesn’t extend batches or covariates but you would need to use load_query_data. For this you should first convert your SCVI model to a SCANVI model (from_scvi_model) and can then without training use the load_query_data function. This API is meant so across models only load_query_data extend categories.

1 Like

Great, many thanks for your answer, it helped a lot! However, I did need to train SCANVI model before load_query_data, otherwise the predictions did not work. I’ve also found this resource, which also suggests to train the SCANVI model.

After initializing the SCANVI model, you can also train it and afterwards integrate query data with load_query_data. In the approach above, you have to unfreeze the network using unfrozen=False in load_query_data otherwise it doesn’t train the SCANVI model (sorry for the oversight).

1 Like