.SOLO.from_scvi_model() throwing AttributeError


I’m using scvi’s scvi.external.SOLO function to do doublet detection, but I get an AttributeError with Anndata.

import scvi
import scanpy as sc
adata = sc.read_csv('raw_counts/GSM5226574_C51ctr_raw_counts.csv').T
sc.pp.filter_genes(adata, min_cells = 10)
sc.pp.highly_variable_genes(adata, n_top_genes = 2000, subset = True, flavor = 'seurat_v3')

vae = scvi.model.SCVI(adata)
solo = scvi.external.SOLO.from_scvi_model(vae)

Here is erro INFO traceback.

/home/ug1264/anaconda3/envs/py39/lib/python3.9/site-packages/anndata/_core/anndata.py:1755: FutureWarning: The AnnData.concatenate method is deprecated in favour of the anndata.concat function. Please use anndata.concat instead.

See the tutorial for concat at: https://anndata.readthedocs.io/en/latest/concatenation.html
AttributeError                            Traceback (most recent call last)
Cell In[10], line 1
----> 1 solo = scvi.external.SOLO.from_scvi_model(vae)
      2 solo.train()

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/scvi/external/solo/_model.py:204, in SOLO.from_scvi_model(cls, scvi_model, adata, restrict_to_batch, doublet_ratio, **classifier_kwargs)
    199     doublet_adata = AnnData(
    200         np.concatenate([doublet_latent_rep, np.log(doublet_lib_size)], axis=1)
    201     )
    202     doublet_adata.obs[LABELS_KEY] = "doublet"
--> 204     full_adata = latent_adata.concatenate(doublet_adata)
    205     cls.setup_anndata(full_adata, labels_key=LABELS_KEY)
    206 return cls(full_adata, **classifier_kwargs)

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/anndata/_core/anndata.py:1808, in AnnData.concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1799 pat = rf"-({'|'.join(batch_categories)})$"
   1800 out.var = merge_dataframes(
   1801     [a.var for a in all_adatas],
   1802     out.var_names,
   1803     partial(merge_outer, batch_keys=batch_categories, merge=merge_same),
   1804 )
   1805 out.var = out.var.iloc[
   1806     :,
   1807     (
-> 1808         out.var.columns.str.extract(pat, expand=False)
   1809         .fillna("")
   1810         .argsort(kind="stable")
   1811     ),
   1812 ]
   1814 return out

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
    221 if obj is None:
    222     # we're accessing the attribute of the class, i.e., Dataset.geo
    223     return self._accessor
--> 224 accessor_obj = self._accessor(obj)
    225 # Replace the property with the accessor object. Inspired by:
    226 # https://www.pydanny.com/cached-property.html
    227 # We need to use object.__setattr__ because we overwrite __setattr__ on
    228 # NDFrame
    229 object.__setattr__(obj, self._name, accessor_obj)

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/strings/accessor.py:181, in StringMethods.__init__(self, data)
    178 def __init__(self, data) -> None:
    179     from pandas.core.arrays.string_ import StringDtype
--> 181     self._inferred_dtype = self._validate(data)
    182     self._is_categorical = is_categorical_dtype(data.dtype)
    183     self._is_string = isinstance(data.dtype, StringDtype)

File ~/anaconda3/envs/py39/lib/python3.9/site-packages/pandas/core/strings/accessor.py:235, in StringMethods._validate(data)
    232 inferred_dtype = lib.infer_dtype(values, skipna=True)
    234 if inferred_dtype not in allowed_types:
--> 235     raise AttributeError("Can only use .str accessor with string values!")
    236 return inferred_dtype

AttributeError: Can only use .str accessor with string values!

I think this has been solved in a recent commit: SOLO.from_scvi_model concatenate with str var names (#2013) · scverse/scvi-tools@6b0fd27 (github.com)

You can wait until the next version is released, but what I did is to patch my version in conda. Basically, you could edit ~/anaconda3/envs/py39/lib/python3.9/site-packages/scvi/external/solo/_model.py

and replace line 204 that contains:

full_adata = latent_adata.concatenate(doublet_adata)


import anndata
full_adata = anndata.concat([latent_adata, doublet_adata])

Disclaimer: Not sure how this will affect the update when the new version comes. Perhaps nothing happens and it will be updated just fine. I will deal with that then but keep it in mind if you try to follow this hack.

Thanks for the answer!! Everything appears to work fine now.