Denoising/remove ambient RNA with scAR then scvi modeling?


I am working with multiple 10x prepped samples. I would like to remove ambient RNA using a scAR model. Considering you must provide the raw feature matrix and filtered gene matrix to do this, my plan was to do this scAR ambient removal first for each sample, then concatenate each of these objects into a singular adata object. From here is it feasible to train an scvi model on the denoised counts and cluster, do differential gene expression, and do doublet predcition, etc from here?

My question is, is this the appropriate way to do this type of analysis? I know I can use SoupX to achieve the same denoised counts, but I would prefer to stay entirely in python if possible.

Hi there. I have not done this, but you’ve outlined the basic workflow I’ve been considering. I’m trying to make an alternative to an R based workflow which uses scrublet, and SoupX on the individual sample data objects. And harmony on the merged data. So I was thinking: SOLO-> scAR → scVI.

So far I’ve just been testing scrublet → scVI and it does okay. I found your post when searching for examples of people using a full scvi-tool examples. If I make progress with scAR (and / or SOLO) I’ll check back here.


Glad to hear others are thinking about this. I actually went through with this and have a pipeline working and it seems to work well. I went scAR → doubletdetector (python package), removed the doublets ,and saved the raw and denoised counts to new layers, and overwrote the adata.X with the denoised counts. Then I concatenated these into one large adata object and trained an scvi model on it, and clustered off the latent space.

I have the code working if you have any questions let me know and I’m happy to share how I went about it

Great to hear its working for you! If you’re able to share a working example, I’m sure everyone would benefit from seeing the proof of concept. I’ll be instantiating something in the next few weeks.
Quick question: Can you say why you are using doubletdetector vs. scrublet or SOLO? I’d love to start to get some insight around the strengths and weaknesses of the different methods. I’ll start with anecdotes as a start towards some solid benchmarks.