inferCNVpy after running scVI with batch as key

zvittorio · October 20, 2022, 10:19am

Hi scverse community,

I have at my disposal a bunch of datasets, of which one is a cancer patient while the other ones are the same organ but in a healthy condition.
The idea is to use the cell types of the healthy samples as reference cells for inferCNVpy, but I first want to integrate them with the basic script for scVI.
Now, I have some questions that originate from basically being a newbie in working in the scverse ecosystem (I’ve always used R/Bioc tools) but I think this could be useful for many users.
The history of the samples is the following:

read alignment and quantification with cell ranger
QC, normalization, HVF selection, dim. reduction, clustering and annotation in R/Bioc (scater + Seurat)
3a. Merge all healthy samples in a single Seurat object
3a. Convert both (tumor + healthy) Seurat objects to SingleCellExperiment objects so that to retain raw counts only and metadata (so cell types too)
3b. Convert SCE objects to H5AD’s with zellkoverter, so that adata.X contains raw counts
Import anndata objects in a notebook and merge them with anndata.concat
Normalization, HVF selection
Training scVI model on the merged samples (I followed the introductory notebook from scvi-tools website: Introduction to scvi-tools - scvi-tools)
7a. Appending gene positions to anndata object
7b. Running infercnvpy (as showed here: Infer CNV on lung cancer dataset — infercnvpy documentation)
Some post-processing

Now, the questions that I have are:

is scVI going to alter the adata.X matrix?
what is the anndata slot infercnvpy works on?
most of all, what does infercnvpy expect in the anndata object?
in general, is what I did reasonable? Besides starting directly with Scanpy so that no object/file conversion is needed…

Thanks a lot to anyone that can provide comments, help or answers!

Kind regards

Vittorio

zvittorio · December 19, 2022, 2:38pm

Update.

Now I know that:

scVI is not going to alter the adata.X matrix (unless I indicate to do so, I guess)
infercnvpy works on adata.X by default, but it has the layer argument (in infercnvpy.tl.infercnv) to take different layers as input
infercnvpy expect a “gene expression matrix, appropriately preprocessed” in the adata layer it is going to work on
I am still open to comments on the steps I list in the original post

The open questions that sums it up is the following: does it make sense to run infercnvpy on the normalized (decoded) gene expression from scVI? (basically the output of scvi.model.SCVI.get_normalized_expression)

Again, thanks a lot to anyone that can provide comments, help or answers!

Kind regards

Vittorio

adamgayoso · December 19, 2022, 11:45pm

Thanks for the post. I’m trying to understand where scvi-tools fits into this pipeline. Do you want to smooth the gene expression values?

zvittorio · January 11, 2023, 1:55pm

Sorry for the late reply, I must have missed the notification. I don’t want to smooth gene expression necessarily.
My doubt is what and how should I use the results from scvi model training for further analysis?
I have seen that the scvi normalized expression values are took into account for visualization across all batches.
But can you use the same values as input for other tools? (in this example, infercnv)

jpagolia · March 21, 2025, 10:10pm

I am encountering a similar issue and wanted to follow up on this question. In a typical tumor atlas project, you would want to verify your annotations of tumor cells by CNV. But the tumor samples are typically run in multiple batches, and infercnvpy (as well as inferCNV and other CNV inference packages) is susceptible to batch effect. I’d like to be able to use scvi.model.SCVI.get_normalized_expression() with the transform_batch parameter to obtain a batch-corrected gene expression matrix for direct input into infercnvpy. I’ve done this, but the results look much worse (overly smoothed, without clear CNV clusters) than the original non-batch-corrected results.

Has anyone used scVI output as input for infercnvpy? How did you do it?
Does the strategy above make sense?
Any alternatives that people have run into?

@grst I’d greatly appreciate your insight if you have the time.

grst · March 24, 2025, 7:19am

Sorry, I haven’t attempted yet to use infercnv with batch correction. The best approach would probably be a dedicated model that learns the copy number variations taking batch effects into account, but that would be a research project by itself.

Topic		Replies	Views
Preserving biological variability in scVI sample integration scvi-tools integration , scvi	4	736	February 16, 2024
scVI integration set batch_key and poor Umap result scvi-tools integration , diff-exp , scvi	3	180	August 7, 2024
Scvi-tools and xenium scvi-tools integration , scvi	7	347	March 13, 2025
Issues setting up anndata for SCVI anndata integration , scvi	2	405	April 2, 2024
What kind of data type does infercnvpy need as input? Help scrna-seq , integration	0	18	November 27, 2024

inferCNVpy after running scVI with batch as key

Related topics