Hi scverse community,
I have at my disposal a bunch of datasets, of which one is a cancer patient while the other ones are the same organ but in a healthy condition.
The idea is to use the cell types of the healthy samples as reference cells for inferCNVpy, but I first want to integrate them with the basic script for scVI.
Now, I have some questions that originate from basically being a newbie in working in the scverse ecosystem (I’ve always used R/Bioc tools) but I think this could be useful for many users.
The history of the samples is the following:
- read alignment and quantification with cell ranger
- QC, normalization, HVF selection, dim. reduction, clustering and annotation in R/Bioc (scater + Seurat)
3a. Merge all healthy samples in a single Seurat object
3a. Convert both (tumor + healthy) Seurat objects to SingleCellExperiment objects so that to retain raw counts only and metadata (so cell types too)
3b. Convert SCE objects to H5AD’s with zellkoverter, so that adata.X contains raw counts - Import anndata objects in a notebook and merge them with anndata.concat
- Normalization, HVF selection
- Training scVI model on the merged samples (I followed the introductory notebook from scvi-tools website: Introduction to scvi-tools - scvi-tools)
7a. Appending gene positions to anndata object
7b. Running infercnvpy (as showed here: Infer CNV on lung cancer dataset — infercnvpy documentation) - Some post-processing
Now, the questions that I have are:
- is scVI going to alter the adata.X matrix?
- what is the anndata slot infercnvpy works on?
- most of all, what does infercnvpy expect in the anndata object?
- in general, is what I did reasonable? Besides starting directly with Scanpy so that no object/file conversion is needed…
Thanks a lot to anyone that can provide comments, help or answers!
Kind regards
Vittorio