I apologize for reposting here—I initially submitted this as a GitHub issue, but was kindly redirected to ask usage questions on Discourse.
I’m working with Xenium spatial transcriptomics data without any prior cell‐type labels. My core question is: Can ResolVI’s unsupervised mode serve as a reliable foundation for cell annotation? For datasets where no reference labels exist, what is the recommended workflow to apply ResolVI for accurate, unsupervised cell‐type identification?
Any guidance on best practices, downstream clustering strategies, or examples of how others have approached annotation in a fully unsupervised setting would be greatly appreciated.
Hi, thanks for reposting. I would recommend using resolVI and do scVIVA on resolVI coordinates. I would potentially provide the labels to resolVI as supervised mode. However, you can also just run clustering on resolVI coordinates.
In the cancer case study, we also provided very coarse labels and I think it will be fine. We used ProSeg estimated counts there. In my experience, they tend to produce quite well coarse clusters that can be used for supervised resolVI. Please don’t use resolVI corrected counts in another SCVI model. This can have negative effects and likely doesn’t help much with performance.
In most instances the gains using supervised resolVI are limited. It’s critical in my hands if segmentation is bad like in VisiumHD of spatially dense tissues like cancer tissue or original Xenium segmentation from a couple of Spaceranger versions ago.
Yes. However, keep in mind that for scVIVA you will identify erroneous signal if segmentation is not optimized. We start there with optimized segmentation. Replacing scANVI with resolVI in scVIVA is safe though and if your segmentation is reasonably good this pipeline makes sense.
Hi, thanks for the nice discussion, enjoyed very much.
I am thinking of the potential usage of resolVI corrected counts and wonder why it is not recommended to use it on other SCVI models.
My first thought is that the recon loss is inaccurate since it is based on NB/ZINB?
Take the task of mapping scRNA-Seq to Xenium for instance, could we then use the normalized counts and resolVI corrected counts in models take normalized counts, such as trVAE with MSE loss?
Hi, the generated counts are again distributed as a negative binomial (or Poisson) distribution as we use posterior predictive samples in resolVI. In general predictions of a model contain biases not present in the original data (e.g. 5000 genes are compressed to 10 dimensions). Training a second model on generated counts can increase these biases. For a general discussion, AI models collapse when trained on recursively generated data | Nature provides some ideas. Developing a joint model of scRNA-seq and spatial data is likely more promising. However, in my opinion there is currently no joint model that I can recommend without reservations and we can identify cell-types to a higher resolution in scRNA-seq data compared to spatial data. Transferring labels at this granular resolution will likely lead to hallucinations (cells labeled as e.g. resident memory T cells even though no reliable markers where available in the spatial data).