ZeroDivisionError using scvi-tools

yojetsharma · September 9, 2024, 5:52pm

I am running snapatac2 tutorial wherein scvi-tools are used to map reference to query (so I am a new scvi-tools at the moment). Reference is a scRNA dataset and query is an atac seq:

query= snap.pp.make_gene_matrix(atac, snap.genome.hg38)
query
AnnData object with n_obs × n_vars = 58534 × 60606
    obs: 'sample', 'leiden'
reference=snap.read("GEX.h5ad", backed=None)
AnnData object with n_obs × n_vars = 187285 × 2000
    obs: 'sample', 'cell_type'
    var: 'highly_variable'
query.obs['cell_type']=pd.NA
data = ad.concat(
    [reference, query],
    join='inner',
    label='batch',
    keys=["reference", "query"],
    index_unique='_',
)
data
AnnData object with n_obs × n_vars = 245819 × 1397
    obs: 'sample', 'cell_type', 'batch'
sc.pp.filter_genes(data, min_cells=5)
sc.pp.highly_variable_genes(
    data,
    n_top_genes = 3000,
    flavor="seurat_v3",
    batch_key="batch",
    subset=True
)
scvi.model.SCVI.setup_anndata(data, batch_key="batch")
vae = scvi.model.SCVI(
    data,
    n_layers=2,
    n_latent=30,
    gene_likelihood="nb",
    dispersion="gene-batch",
)
vae.train(max_epochs=1000, early_stopping=True)
INFO: GPU available: True (cuda), used: True
2024-09-10 00:49:45 - INFO - GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
2024-09-10 00:49:45 - INFO - TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
2024-09-10 00:49:45 - INFO - IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
2024-09-10 00:49:45 - INFO - HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
2024-09-10 00:49:45 - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/user/miniconda3/envs/scvi-env/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
/home/user/miniconda3/envs/scvi-env/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.

Epoch 984/1000:  98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉  | 984/1000 [1:50:50<01:48,  6.76s/it, v_num=1, train_loss_step=399, train_loss_epoch=428]
Monitored metric elbo_validation did not improve in the last 45 records. Best score: 425.723. Signaling Trainer to stop.

ax = vae.history['elbo_train'][1:].plot()
vae.history['elbo_validation'].plot(ax=ax)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[22], line 2
      1 ax = vae.history['elbo_train'][1:].plot()
----> 2 vae.history['elbo_validation'].plot(ax=ax)

File ~/.local/lib/python3.10/site-packages/pandas/plotting/_core.py:1000, in PlotAccessor.__call__(self, *args, **kwargs)
    997             label_name = label_kw or data.columns
    998             data.columns = label_name
-> 1000 return plot_backend.plot(data, kind=kind, **kwargs)

File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/__init__.py:71, in plot(data, kind, **kwargs)
     69         kwargs["ax"] = getattr(ax, "left_ax", ax)
     70 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 71 plot_obj.generate()
     72 plot_obj.draw()
     73 return plot_obj.result

File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py:454, in MPLPlot.generate(self)
    452 self._make_plot()
    453 self._add_table()
--> 454 self._make_legend()
    455 self._adorn_subplots()
    457 for ax in self.axes:

File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py:792, in MPLPlot._make_legend(self)
    790     title = leg.get_title().get_text()
    791     # Replace leg.LegendHandles because it misses marker info
--> 792     handles = leg.legendHandles
    793     labels = [x.get_text() for x in leg.get_texts()]
    795 if self.legend:

AttributeError: 'Legend' object has no attribute 'legendHandles'

data.obs["celltype_scanvi"] = 'Unknown'
ref_idx = data.obs['batch'] == "reference"
data.obs["celltype_scanvi"][ref_idx] = data.obs['cell_type'][ref_idx]
/tmp/ipykernel_2619671/134013430.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data.obs["celltype_scanvi"][ref_idx] = data.obs['cell_type'][ref_idx]

lvae = scvi.model.SCANVI.from_scvi_model(
    vae,
    adata=data,
    labels_key="celltype_scanvi",
    unlabeled_category="Unknown",
)

cane11 · September 9, 2024, 6:06pm

Hi, we have never tested transfering to gene activation scores and it doesn’t sound right to me. However, to fix your issue. You are setting all celltypes to None. The way you are then updating it with celltypes is incorrect and it won’t update your matrix (see pandas warning). You should do:

data.obs.loc[ref_idx, "celltype_scanvi"] = data.obs.loc[ref_idx, 'cell_type']

yojetsharma · September 9, 2024, 6:56pm

Noted, will try this. Thank you!
However, I’m confused regarding usage of SCVI-tools for reference mapping. Since scvi is a part of the SNAPATAC2 tutorial, I didn’t think too much about it, until now.
Can you please tell me why it’s not the right approach?

cane11 · September 9, 2024, 7:03pm

Gene activation scores from ATAC data is not measured gene expression using those values equivalent to gene expression sounds weird to me and we don’t support this. ScVI is meant to integrate datasets with measured gene expression data. If the developers of SNAPATAC2 wants to highlight this use case, it’s best to check with them why this is justified and how they tested performance.

yojetsharma · September 9, 2024, 7:21pm

Their nature methods manuscript doesn’t really talk about it. So, I will post it there. Thank you for your suggestions!

yojetsharma · September 10, 2024, 5:15am

Update: I am still getting the zerodivisionerror. Will ask on the snapatac2 forum regarding the same.

EmiDalla · September 10, 2024, 12:45pm

Hello, I’m having exactly the same issue following exactly the same tutorial.
It is definitely weird that SnapATAC2 authors suggest to follow this method, however the code used in the tutorial was working for me until I subset the reference: at that point, I suppose some cell group became empty and the error prompted.
I tried using the code recommended by cane11 but got the same error.

@ yojetsharma could you please provide the link to the the snapatac2 forum, so that I can follow the discussion there?
EDIT: found it

cane11 · September 10, 2024, 2:24pm

I’m very certain that both of you are running scANVI without any cell-type labels. It would be helpful to report adata.obs[LABEL_KEY].value_counts() as well as adata.obs[‘_scvi_labels’].value_counts().

yojetsharma · September 10, 2024, 3:20pm

adata.obs[‘_scvi_labels’].value_counts():
0 245819
Name: _scvi_labels, dtype: int64
adata.obs[‘cell_type’].value_counts():
Neuron 63422
Radial glia 60436
Neuroblast 27840
Neuronal IPC 18388
Glioblast 17199
Name: cell_type, dtype: int64

cane11 · September 10, 2024, 3:36pm

Obviously, your labels are not setup properly. In addition, you send the content of cell_type but use celltype_scANVI to set up the model. celltype_scanvi likely contains only unknown currently. With the suggested line in pandas you should be able to update the content.

yojetsharma · September 10, 2024, 3:39pm

Yes, it is solved now. Thank you for this! I feel so stupid right now.

EmiDalla · September 11, 2024, 11:02am

Could I kindly ask you what you did to fix the issue? I just started using python and I certainly missed something very important, but using the modified command provided by @ cane11 didn’t solve the problem for me.

Besides, in the SnapATAC2 tutorial I also got this

data_pbmc5k.obs['_scvi_labels'].value_counts()
_scvi_labels
0    14061
Name: count, dtype: int64

but it didn’t prevent the code to run “properly”.

One last thing: does the code expect to have the same barcodes used in both scRNA-seq and scATAC-seq data? This might be my problem…

Thank you.

yojetsharma · September 11, 2024, 12:21pm

In my case, the error i found out was due to cell_type column in my reference. I didn’t realize the reference i was populating itself had annotation as NaN for some clusters. After correcting this it ran fine.

I am still concerned with what cane11 pointed out regarding use of scvi-tool to annotate atac dataset using rna.

yojetsharma · September 22, 2024, 9:01pm

Were you able to solve it?

EmiDalla · September 23, 2024, 5:42am

Unfortunately not, as I previously said there is probably a need for the same barcodes being used in both datasets, which is not my case.

yojetsharma · September 23, 2024, 3:07pm

I remember asking Kai Zhang on their repository in one of the issues regarding this and he replied that it’s not needed.
And I have actually also mapped my ATAC onto the an independent RNA. It worked after realising the issue.

Topic		Replies	Views
Error in SCANVI.prepare_query_anndata scvi-tools scanvi	5	958	July 14, 2022
Interpreting validation loss curve in query to reference mapping scvi-tools	3	1162	November 15, 2022
CellAssign keyword error: After Integration scvi-tools scvi	3	796	May 26, 2022
totalVI workflow scvi-tools totalvi	12	682	August 1, 2021
Error in scvi.model.TOTALVI.setup_anndata when loading protein-only data scvi-tools totalvi	5	745	August 25, 2022

ZeroDivisionError using scvi-tools

Related topics