Hi,
This is my first post here, so please apologize if I did something wrong.
I have been using TOTALVI for protein-only CITE-seq analyses for quite a while, and I’m sure I have been able to get good results using scvi-tools 1.2.X or 1.4.X.
However, the following code generates an error:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotnine as p9
import scvi
import scanpy as sc
import scipy.io
sc.set_figure_params(figsize=(4, 4), color_map='cividis')
scvi.settings.seed = int(20021208)
np.random.seed(20021208)
adata = sc.read_h5ad(save_path_1+'/Hybridization_ann_single.h5')SCITO_scvi_ann = scvi.data.organize_cite_seq_10x(adata)
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
adata.raw = adata
scvi.model.TOTALVI.setup_anndata(
adata,
protein_expression_obsm_key = 'protein_expression',
batch_key='assignment',
layer='counts')
And the full error traceback is as follows:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-42-769f92bccbda> in <module>
3 protein_expression_obsm_key = 'protein_expression',
4 batch_key='assignment',
----> 5 layer='counts')
5 frames
/usr/local/lib/python3.7/dist-packages/scvi/model/_totalvi.py in setup_anndata(cls, adata, protein_expression_obsm_key, protein_names_uns_key, batch_key, layer, size_factor_key, categorical_covariate_keys, continuous_covariate_keys, **kwargs)
1246 fields=anndata_fields, setup_method_args=setup_method_args
1247 )
-> 1248 adata_manager.register_fields(adata, **kwargs)
1249 cls.register_manager(adata_manager)
1250
/usr/local/lib/python3.7/dist-packages/scvi/data/_manager.py in register_fields(self, adata, source_registry, **transfer_kwargs)
175 field_registry[
176 _constants._STATE_REGISTRY_KEY
--> 177 ] = field.register_field(adata)
178
179 # Compute and set summary stats for the given field.
/usr/local/lib/python3.7/dist-packages/scvi/data/fields/_layer_field.py in register_field(self, adata)
94
95 def register_field(self, adata: AnnData) -> dict:
---> 96 super().register_field(adata)
97 if self.correct_data_format:
98 _verify_and_correct_data_format(adata, self.attr_name, self.attr_key)
/usr/local/lib/python3.7/dist-packages/scvi/data/fields/_base_field.py in register_field(self, adata)
65 stored directly on the AnnData/MuData object.
66 """
---> 67 self.validate_field(adata)
68 return dict()
69
/usr/local/lib/python3.7/dist-packages/scvi/data/fields/_layer_field.py in validate_field(self, adata)
84 x = self.get_field_data(adata)
85
---> 86 if self.is_count_data and not _check_nonnegative_integers(x):
87 logger_data_loc = (
88 "adata.X" if self.attr_key is None else f"adata.layers[{self.attr_key}]"
/usr/local/lib/python3.7/dist-packages/scvi/data/_utils.py in _check_nonnegative_integers(data, n_to_check)
204 raise TypeError("data type not understood")
205
--> 206 inds = np.random.choice(len(data), size=(n_to_check,))
207 check = jax.device_put(data.flat[inds], device=jax.devices("cpu")[0])
208 negative, non_integer = _is_not_count_val(check)
In for your reference, the structure of adata
right before calling setup_anndata
is as follows:
AnnData object with n_obs × n_vars = 134136 × 0
obs: 'assignment', 'IGg_singlet', 'UMI_antibody_raw'
var: 'gene_ids', 'feature_types'
uns: 'random_seed', 'log1p', 'assignment_colors'
obsm: 'X_ADT_umap', 'protein_expression'
layers: 'counts'
adata.obsm['protein_expression']
has the expected shape ( 134136 x 136 )
A brief search shows the error has to do with empty pandas.DataFrame
objects. However, I don’t expect gene expression is processed in Pandas in any way…?