Scvi for developmental data

Hey there,
I want to fit a scvi model for developmental data, meaning the different batches are from different timepoints, and thus are different in many ways. specifically, i intend to use this for the Solo de-dubletting, but my question is also general.

Would it be correct to indicate the different timepoints as batchs?

Hi, thanks for your question. scVI wasn’t explicitly designed for integration across different timepoints, so your mileage may vary when trying to fit the model to developmental data.

I would give it a try and see how well it performs. You can pass in the timepoints with either batch_key or continuous_covariate_keys in SCVI.setup_anndata, depending on whether you want the model to treat it categorically (one-hot) or continuously.

1 Like

It sounds appropriate for decipher. https://www.biorxiv.org/content/10.1101/2023.11.11.566719v1 It assumes correlated latent factors which is what you want to have for time series data and doesn’t try batch correction but embedding in a very low dimensional space (2D).
Generally, I wouldn’t try to correct for the batch in this setup.

1 Like

Thanks @cane11, looks like an interesting method. I think it doesn’t fit my needs (doublet removal in the embedding), but the idea is intriguing .

For doublet removal, just run solo on each batch seperately and train an scVI model with or without batch information (it doesn’t really matter there). You can then run solo for each batch seperately, like:

batches = pd.unique(rna.obs[batch_key])
is_solo_singlet = np.ones((rna.n_obs,), dtype=bool)
for batch in batches:
  logger.add_to_log("Running solo on batch {}...".format(batch))
  solo_batch = scvi.external.SOLO.from_scvi_model(scvi_model, restrict_to_batch=batch)
  solo_batch.train(max_epochs=configs["solo_max_epochs"])
  is_solo_singlet[(rna.obs["batch"] == batch).values] = solo_batch.predict(soft=False) == "singlet"
rna.obs["is_solo_singlet"] = is_solo_singlet
1 Like