TypeError when trying to set up scvi.model.TOTALVI()

hmnzo5gy · August 23, 2023, 7:25pm

When I try to set up MuData model for TotalVI using scvi.model.TOTALVI(mdata), I keep encountering “Type Error”: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

model = scvi.model.TOTALVI(mdata)
INFO     Computing empirical prior initialization for protein background.                                          
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[17], line 1
----> 1 model = scvi.model.TOTALVI(mdata)

File ~/miniconda3/envs/scanpy/lib/python3.11/site-packages/scvi/model/_totalvi.py:142, in TOTALVI.__init__(self, adata, n_latent, gene_dispersion, protein_dispersion, gene_likelihood, latent_distribution, empirical_protein_background_prior, override_missing_proteins, **model_kwargs)
    136 emp_prior = (
    137     empirical_protein_background_prior
    138     if empirical_protein_background_prior is not None
    139     else (self.summary_stats.n_proteins > 10)
    140 )
    141 if emp_prior:
--> 142     prior_mean, prior_scale = self._get_totalvi_protein_priors(adata)
    143 else:
    144     prior_mean, prior_scale = None, None

File ~/miniconda3/envs/scanpy/lib/python3.11/site-packages/scvi/model/_totalvi.py:1163, in TOTALVI._get_totalvi_protein_priors(self, adata, n_cells)
   1161 for c in batch_pro_exp:
   1162     try:
-> 1163         gmm.fit(np.log1p(c.reshape(-1, 1)))
   1164     # when cell is all 0
   1165     except ConvergenceWarning:

File ~/miniconda3/envs/scanpy/lib/python3.11/site-packages/sklearn/mixture/_base.py:181, in BaseMixture.fit(self, X, y)
    155 """Estimate model parameters with the EM algorithm.
...
    538     )
    539 elif isinstance(accept_sparse, (list, tuple)):
    540     if len(accept_sparse) == 0:

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

I import the cellranger multi results using muon.read_10x_h5 and the imported mdata looks like the following

MuData object with n_obs × n_vars = 21812 × 19403
  var:	'gene_ids', 'feature_types', 'genome'
  3 modalities
    rna:	21812 x 15390
      obs:	'most_likely_hypothesis', 'Classification', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'batch'
      var:	'gene_ids', 'feature_types', 'genome', 'n_cells', 'mt', 'ribo', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches'
      uns:	'log1p', 'hvg'
      layers:	'counts'
    protein:	21812 x 13
      obs:	'most_likely_hypothesis', 'Classification', 'batch'
      var:	'gene_ids', 'feature_types', 'genome'
      layers:	'counts'
    rna_subset:	21812 x 4000
      obs:	'most_likely_hypothesis', 'Classification', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'total_counts_ribo', 'pct_counts_ribo', 'batch'
      var:	'gene_ids', 'feature_types', 'genome', 'n_cells', 'mt', 'ribo', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'highly_variable_nbatches'
      uns:	'log1p', 'hvg'
      layers:	'counts'

How can I resolve this problem with totalVI? Thank you

martinkim0 · August 28, 2023, 9:04pm

Hi, looks like you have to convert your protein data to dense format prior to using it with the model.

hmnzo5gy · August 29, 2023, 3:57am

I was able to get it work with the following.

adata = mdata["rna"].copy()
adata.obsm["protein_expression"] = mdata["prot"].layers["counts"].A.copy()
protein_adata = ad.AnnData(adata.obsm["protein_expression"])
protein_adata.obs_names = adata.obs_names
del adata.obsm["protein_expression"]
mdata = md.MuData({"rna": adata, "protein": protein_adata})

Are there any easier ways to prepare the protein layer of mdata for totalVI straight out of reading in cellranger multi output .h5 file?

Topic		Replies	Views
Sparse matrix error using totalVI integration scvi-tools integration	1	401	August 4, 2023
Error in scvi.model.TOTALVI.setup_anndata when loading protein-only data scvi-tools totalvi	5	745	August 25, 2022
Reproducing scVI scvi-tools scvi	2	376	February 6, 2022
How to properly save and load a TotalVI trained model that's based on mudata? muon totalvi	1	798	September 7, 2023
Debugging Issue for CITE-seq analysis with totalVI juytper notebook tutorial scvi-tools scvi , totalvi , developer	6	339	June 21, 2023

TypeError when trying to set up scvi.model.TOTALVI()

Related topics