Normalized data found instead of raw counts

I’m running a scRNA-seq scVI workflow and getting warnings saying that non-integers were found in the AnnData:

adata.layers['counts'] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
adata.raw = adata

sc.pp.highly_variable_genes(
  adata,
  n_top_genes=3000,
  subset=True,
  layer="counts",
  flavor="seurat_v3",
  batch_key="dataset",
)

The warning here:

/opt/conda/envs/scvi-env/lib/python3.9/site-packages/scanpy/preprocessing/_highly_variable_genes.py:64: UserWarning: flavor='seurat_v3' expects raw count data, but non-integers were found. warnings.warn(

Also for scV model setupI:

scvi.model.SCVI.setup_anndata(
  adata,
  layer='counts',
  categorical_covariate_keys=['dataset', 'sample_name'],
  continuous_covariate_keys=['pct_counts_mt']
)
model = scvi.model.SCVI(adata)
model

/opt/conda/envs/scvi-env/lib/python3.9/site-packages/scvi/model/base/_base_model.py:150: UserWarning: Make sure the registered X field in anndata contains unnormalized count data. warnings.warn(

I checked the AnnData object just after loading it and before all the processing:

adata.X[1:10, 1:10].todense()

The matrix has integer values:

matrix([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 1., 0.]], dtype=float32)

So, does specifying layer="counts" not work for some reason?

Could you try:

np.unique(adata.layers["counts"].data)

It’s possible there are some non-integer values in there. Certain aligners will assign partial counts for ambiguous reads, which can trigger the warning.

Thanks, I concatenated multipe datasets and one of them was normalized. But it was in the end, so I didn’t catch it by checking the first rows.

1 Like