I am running snapatac2 tutorial wherein scvi-tools are used to map reference to query (so I am a new scvi-tools at the moment). Reference is a scRNA dataset and query is an atac seq:
query= snap.pp.make_gene_matrix(atac, snap.genome.hg38)
query
AnnData object with n_obs × n_vars = 58534 × 60606
obs: 'sample', 'leiden'
reference=snap.read("GEX.h5ad", backed=None)
AnnData object with n_obs × n_vars = 187285 × 2000
obs: 'sample', 'cell_type'
var: 'highly_variable'
query.obs['cell_type']=pd.NA
data = ad.concat(
[reference, query],
join='inner',
label='batch',
keys=["reference", "query"],
index_unique='_',
)
data
AnnData object with n_obs × n_vars = 245819 × 1397
obs: 'sample', 'cell_type', 'batch'
sc.pp.filter_genes(data, min_cells=5)
sc.pp.highly_variable_genes(
data,
n_top_genes = 3000,
flavor="seurat_v3",
batch_key="batch",
subset=True
)
scvi.model.SCVI.setup_anndata(data, batch_key="batch")
vae = scvi.model.SCVI(
data,
n_layers=2,
n_latent=30,
gene_likelihood="nb",
dispersion="gene-batch",
)
vae.train(max_epochs=1000, early_stopping=True)
INFO: GPU available: True (cuda), used: True
2024-09-10 00:49:45 - INFO - GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
2024-09-10 00:49:45 - INFO - TPU available: False, using: 0 TPU cores
INFO: IPU available: False, using: 0 IPUs
2024-09-10 00:49:45 - INFO - IPU available: False, using: 0 IPUs
INFO: HPU available: False, using: 0 HPUs
2024-09-10 00:49:45 - INFO - HPU available: False, using: 0 HPUs
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
2024-09-10 00:49:45 - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/user/miniconda3/envs/scvi-env/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
/home/user/miniconda3/envs/scvi-env/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=31` in the `DataLoader` to improve performance.
Epoch 984/1000: 98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 984/1000 [1:50:50<01:48, 6.76s/it, v_num=1, train_loss_step=399, train_loss_epoch=428]
Monitored metric elbo_validation did not improve in the last 45 records. Best score: 425.723. Signaling Trainer to stop.
ax = vae.history['elbo_train'][1:].plot()
vae.history['elbo_validation'].plot(ax=ax)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[22], line 2
1 ax = vae.history['elbo_train'][1:].plot()
----> 2 vae.history['elbo_validation'].plot(ax=ax)
File ~/.local/lib/python3.10/site-packages/pandas/plotting/_core.py:1000, in PlotAccessor.__call__(self, *args, **kwargs)
997 label_name = label_kw or data.columns
998 data.columns = label_name
-> 1000 return plot_backend.plot(data, kind=kind, **kwargs)
File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/__init__.py:71, in plot(data, kind, **kwargs)
69 kwargs["ax"] = getattr(ax, "left_ax", ax)
70 plot_obj = PLOT_CLASSES[kind](data, **kwargs)
---> 71 plot_obj.generate()
72 plot_obj.draw()
73 return plot_obj.result
File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py:454, in MPLPlot.generate(self)
452 self._make_plot()
453 self._add_table()
--> 454 self._make_legend()
455 self._adorn_subplots()
457 for ax in self.axes:
File ~/.local/lib/python3.10/site-packages/pandas/plotting/_matplotlib/core.py:792, in MPLPlot._make_legend(self)
790 title = leg.get_title().get_text()
791 # Replace leg.LegendHandles because it misses marker info
--> 792 handles = leg.legendHandles
793 labels = [x.get_text() for x in leg.get_texts()]
795 if self.legend:
AttributeError: 'Legend' object has no attribute 'legendHandles'
data.obs["celltype_scanvi"] = 'Unknown'
ref_idx = data.obs['batch'] == "reference"
data.obs["celltype_scanvi"][ref_idx] = data.obs['cell_type'][ref_idx]
/tmp/ipykernel_2619671/134013430.py:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
data.obs["celltype_scanvi"][ref_idx] = data.obs['cell_type'][ref_idx]
lvae = scvi.model.SCANVI.from_scvi_model(
vae,
adata=data,
labels_key="celltype_scanvi",
unlabeled_category="Unknown",
)