inferCNVpy after running scVI with batch as key

Hi scverse community,

I have at my disposal a bunch of datasets, of which one is a cancer patient while the other ones are the same organ but in a healthy condition.
The idea is to use the cell types of the healthy samples as reference cells for inferCNVpy, but I first want to integrate them with the basic script for scVI.
Now, I have some questions that originate from basically being a newbie in working in the scverse ecosystem (I’ve always used R/Bioc tools) but I think this could be useful for many users.
The history of the samples is the following:

  1. read alignment and quantification with cell ranger
  2. QC, normalization, HVF selection, dim. reduction, clustering and annotation in R/Bioc (scater + Seurat)
    3a. Merge all healthy samples in a single Seurat object
    3a. Convert both (tumor + healthy) Seurat objects to SingleCellExperiment objects so that to retain raw counts only and metadata (so cell types too)
    3b. Convert SCE objects to H5AD’s with zellkoverter, so that adata.X contains raw counts
  3. Import anndata objects in a notebook and merge them with anndata.concat
  4. Normalization, HVF selection
  5. Training scVI model on the merged samples (I followed the introductory notebook from scvi-tools website: Introduction to scvi-tools - scvi-tools)
    7a. Appending gene positions to anndata object
    7b. Running infercnvpy (as showed here: Infer CNV on lung cancer dataset — infercnvpy documentation)
  6. Some post-processing

Now, the questions that I have are:

  • is scVI going to alter the adata.X matrix?
  • what is the anndata slot infercnvpy works on?
  • most of all, what does infercnvpy expect in the anndata object?
  • in general, is what I did reasonable? Besides starting directly with Scanpy so that no object/file conversion is needed…

Thanks a lot to anyone that can provide comments, help or answers!

Kind regards

Vittorio

Update.

Now I know that:

  • scVI is not going to alter the adata.X matrix (unless I indicate to do so, I guess)
  • infercnvpy works on adata.X by default, but it has the layer argument (in infercnvpy.tl.infercnv) to take different layers as input
  • infercnvpy expect a “gene expression matrix, appropriately preprocessed” in the adata layer it is going to work on
  • I am still open to comments on the steps I list in the original post

The open questions that sums it up is the following: does it make sense to run infercnvpy on the normalized (decoded) gene expression from scVI? (basically the output of scvi.model.SCVI.get_normalized_expression)

Again, thanks a lot to anyone that can provide comments, help or answers!

Kind regards

Vittorio

Thanks for the post. I’m trying to understand where scvi-tools fits into this pipeline. Do you want to smooth the gene expression values?

Sorry for the late reply, I must have missed the notification. I don’t want to smooth gene expression necessarily.
My doubt is what and how should I use the results from scvi model training for further analysis?
I have seen that the scvi normalized expression values are took into account for visualization across all batches.
But can you use the same values as input for other tools? (in this example, infercnv)

1 Like