Hello scvi team,
I am trying to integrate datasets using scvi in Rstudio. I used SCTransformation instead of log normalization to normalize my Seurat object. So, I extracted the RNA counts, variable features, and metadata to create a new Seurat object and converted to anndata for integration using scvi.
This is the original seurat object.
seurat
An object of class Seurat
64931 features across 33657 samples within 2 assays
Active assay: SCT (28330 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: RNA
This is the new seurat object I created with RNA counts
seurat_n ← CreateSeuratObject(counts = LayerData(seurat_RNA, layer = “counts”))
VariableFeatures(seurat_n) ← VariableFeatures(seurat_RNA)
seurat_n@meta.data ← seurat_RNA@meta.data
seurat_n[[“RNA”]] ← as(seurat_n[[“RNA”]], “Assay”)
adata ← convertFormat(seurat_n, from=“seurat”, to=“anndata”, main_layer=“counts”, drop_single_values=FALSE)
This is the resulting adata.
print(adata)
AnnData object with n_obs × n_vars = 33657 × 36601
obs: ‘orig.ident’, ‘nCount_RNA’, ‘nFeature_RNA’, ‘HPAP.ID’, ‘percent.mt’, ‘log10GenesPerUMI’, ‘age’, ‘ident’, ‘scDblFinder.sample’, ‘scDblFinder.class’, ‘scDblFinder.score’, ‘scDblFinder.weighted’, ‘scDblFinder.cxds_score’, ‘nCount_SCT’, ‘nFeature_SCT’, ‘_scvi_batch’, ‘_scvi_labels’
var: ‘var.features’, ‘var.features.rank’
When I run the following code to set up the anndata for scvi model, I receive the following error.
scvi$model$SCVI$setup_anndata(adata, batch_key = ‘HPAP.ID’)
/Users/seullee/miniforge3/envs/scvinew-env/lib/python3.9/site-packages/scvi/data/fields/_layer_field.py:115: UserWarning: Training will be faster when sparse matrix is formatted as CSR. It is safe to cast before model initialization.
_verify_and_correct_data_format(adata, self.attr_name, self.attr_key)
None
The same error occurs when I don’t specify the batch_key.
I would really appreciate your advice on what may be causing this error and your guidance on how to proceed.
Thank you very much for your help in advance!