Scvi and subtyping

I had a question regarding subclustering clusters and the scvi model. Specifically, I wanted to subcluster some immune cells in my sample to get some important subtypes we are interested in. However, I’m unsure whether I should retrain a new model on the subset of immune cells, or whether I can use the previously trained model.

Also, I wanted to check if I am understanding the differential expression module correctly, the module requires the model to be trained, correct? Or does it perform inference independent of the training.

Hi LinearParadox,

I would subcluster using the original embeddings for the cell type you want to subcluster. The ‘resolution’ of the SCVI embedding space is quite high. For visualization, I would fit a new UMAP/TSNE/MDE on the original embeddings for the subset of cells. These will fill out the 2D plot space with the cells you give them, making it more clear which cells are pairwise neighbors in the embedding space.

Regarding differential expression: yes, the model needs to be trained to give meaningful results. It uses the inference network (encoder) to obtain uncertainty for cells from the different conditions, and the decoder is used to map the uncertainty to gene expression values.

Hope this helps!