Issues setting up anndata for SCVI

Hello scvi team,

I am trying to integrate datasets using scvi in Rstudio. I used SCTransformation instead of log normalization to normalize my Seurat object. So, I extracted the RNA counts, variable features, and metadata to create a new Seurat object and converted to anndata for integration using scvi.

This is the original seurat object.

An object of class Seurat
64931 features across 33657 samples within 2 assays
Active assay: SCT (28330 features, 3000 variable features)
3 layers present: counts, data,
1 other assay present: RNA

This is the new seurat object I created with RNA counts

seurat_n ← CreateSeuratObject(counts = LayerData(seurat_RNA, layer = “counts”))
VariableFeatures(seurat_n) ← VariableFeatures(seurat_RNA) ←
seurat_n[[“RNA”]] ← as(seurat_n[[“RNA”]], “Assay”)
adata ← convertFormat(seurat_n, from=“seurat”, to=“anndata”, main_layer=“counts”, drop_single_values=FALSE)

This is the resulting adata.

AnnData object with n_obs × n_vars = 33657 × 36601
obs: ‘orig.ident’, ‘nCount_RNA’, ‘nFeature_RNA’, ‘HPAP.ID’, ‘’, ‘log10GenesPerUMI’, ‘age’, ‘ident’, ‘scDblFinder.sample’, ‘scDblFinder.class’, ‘scDblFinder.score’, ‘scDblFinder.weighted’, ‘scDblFinder.cxds_score’, ‘nCount_SCT’, ‘nFeature_SCT’, ‘_scvi_batch’, ‘_scvi_labels’
var: ‘var.features’, ‘var.features.rank’

When I run the following code to set up the anndata for scvi model, I receive the following error.

scvi$model$SCVI$setup_anndata(adata, batch_key = ‘HPAP.ID’)
/Users/seullee/miniforge3/envs/scvinew-env/lib/python3.9/site-packages/scvi/data/fields/ UserWarning: Training will be faster when sparse matrix is formatted as CSR. It is safe to cast before model initialization.
_verify_and_correct_data_format(adata, self.attr_name, self.attr_key)

The same error occurs when I don’t specify the batch_key.
I would really appreciate your advice on what may be causing this error and your guidance on how to proceed.
Thank you very much for your help in advance!

The model will train anyhow, so it’s a warning not an error. You need to change the format of sparsity, even easier and okay as your data is small, is to make your count object dense. Reading CSC - column sparse data row-wise (each cell) is an efficient procedure and will slow down your training.

@cane11, thanks for your guidance!
I’m a bit puzzled, though. You mentioned that handling column sparse data row-wise is efficient yet might slow down training. Did you mean it could actually speed up the process?

Also, could you advise on converting my data to CSR format prior to training? Creating a Seurat object with a dense matrix leads to an automatic conversion to a sparse dgCMatrix. Any tips on managing this would be greatly appreciated!