Excluding Ig and ribosomal genes from HVG selection in scVI, best practice?

marencc · October 21, 2025, 12:39pm

Hi everyone,

I’m working with single-cell RNA-seq data from CD45⁺ immune cells (mostly lymphoid lineages) and integrating multiple batches using scVI, which so far has given the best batch correction results.

We’re now reprocessing the data after adjusting QC thresholds, and I came across some recent papers where they state:

“Prior to PCA, nearest neighbor clustering, and UMAP representations, some genes were filtered from inclusion including those associated with Ig loci (Igk, Igl, or Igh), ribosomal proteins (Rps or Rpl), mitochondrial (mt-), sex (Xist),…”

My questions are:

Would it make sense to exclude these genes before computing HVGs, so that they never influence the latent space learned by scVI?
Or is it better to compute HVGs normally, then remove these specific genes after HVG selection (e.g., set highly_variable=False for them)?
Once we have obtained broad cell type annotations (T cells, B cells, myeloid, etc.),
is it advisable to subset, recalculate HVGs within one lineage (e.g. T cells), and retrain a new scVI model for finer clustering?
Or is it acceptable to rely on the latent embeddings from the original full scVI model for the subcluster analysis?

Any insights or examples of good practice would be appreciated.

Thanks in advance!

ori-kron-wis · October 22, 2025, 9:15am

Hey,

Generally, the latter is the way to go to not affect the biological signal and other downstream tasks, such as DE. So, keeping them, but not letting them affect the latent space. However, the decision may also be influenced by the problem you are trying to solve.
I think you can only gain by performing this, considering (1).

In any case, you can always compare the different strategies by running scib-metrics on the generated latent space(s) and DE to validate your expected results.

Topic		Replies	Views
Selection of HVG in scVI scvi-tools scvi	3	1292	December 20, 2022
All genes or highly variable genes? scvi-tools gene-selection , scvi , totalvi	10	4353	March 31, 2022
Usage of HVG in scVI scvi-tools gene-selection , scvi	12	2523	March 1, 2022
Gene filtering prior to batch correction scRNA-seq scrna-seq , integration	2	828	July 9, 2021
Understanding scVI integration inside R with Seurat v5 & SCTransform scvi-tools integration	1	425	April 6, 2025

Excluding Ig and ribosomal genes from HVG selection in scVI, best practice?

Related topics