I’m looking for a way to transfer labels from a CITE-seq reference to a CITE-seq query. Initially I thought this should not be much different than the usual workflow with scvi or scArches. I thought I’d simply replace scVI with totalVI in one of these tutorials and am done.
But then I realized that the from_scvi_model function does not have a corresponding function for totalVI, so I’m stuck.
Is there a simple solution that I have missed? If not, why is it harder to transfer labels for CITE-seq?
So this is correct that as of now you cannot train a classifier on top of totalVI in a way that also affects the encoder (as in a model analogous to scANVI for totalVI).
In this tutorial I train a RF classifier on top of the latent space and show how to store it as an attribute of the model class (so it saves/loads).
I also want to add that this is a requested feature so we can look into it more as an enhancement this fall.
For my understanding: If the encoder does not get retrained, what happens in your tutorial’s section Query model training? I imagine the network weights learned on the reference are simply applied to the raw counts of the query to compute the latent space. Or is something more complex going on, such as (re)training the decoder?
The tutorial you shared does not load scArches, and also does not call any model from it. Are you saying it still somehow use scArches in its section ‘Query model training’ (linked in my last post)? I feel this is a crucial point for me to understand how the scvi-tools and scArches modules work together.
load_query_data is precisely doing scArches architecture surgery. Any scvi-tools model that is being used from the scarches codebase is just calling the code in scvi-tools, so there is really no difference between the packages except the few models that are unique to scarches codebase.