Encountering Error in Label Transfer : Query Dataset Slightly Larger than Reference Dataset

Hello All,

I am working on labeling the transfer dataset using scANVI, where the reference file contains 460,000 cells, and the query file contains 656,000 cells, both from the same brain region. The reference dataset comprises 65 cell types, and I expect the same cell types in the query data. Our goal is to transfer labels from the reference to the query data. However, I am encountering an error message during the prediction process.

version scvi-tools: scvi-tools 1.0.0

Code:
lvae=scvi.model.SCANVI.from_scvi_model(vae,adata=adata,unlabeled_category = "Unknown", labels_key = name_of_the_new_col,linear_classifier=True) lvae.train(max_epochs=200,n_samples_per_label=1000)

ValueError: Expected parameter loc (Tensor of shape (128, 30)) of distribution Normal(loc: torch.Size([128, 30]), scale: torch.Size([128, 30])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        ...,
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       grad_fn=<AddmmBackward0>)

But there is no NAN values in .obs or .var in anndata

Please help me to address this. Do i need to tweak the n_samples_per_label?

Thanks
Akila

Hi, sorry you’re experiencing this issue. Is the error occurring in the first epoch of training or sometime after? In addition, could you try out the following potential fixes and let me know if any of them work?

  • Downgrade to scvi-tools 0.20.3
  • Set scvi.settings.seed = 0 before training
  • Set linear_classifier=False
  • Set n_samples_per_label to something much lower, maybe like 50
  • Pass in var_activation=torch.nn.functional.softplus when initializing SCVI

Thank you for your suggestion.

I attempted your advice by maintaining linear_classifier=False and decreasing n_sample_per_label to 100. However, the prediction failed at 52 epochs, and the same error persists.

ValueError: Expected parameter loc (Tensor of shape (128, 30)) of distribution Normal(loc: torch.Size([128, 30]), scale: torch.Size([128, 30])) to satisfy the constraint Real(), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]

Is there is any way to figure out?

Thanks
Akila

Were you able to try out the other suggestions too? I would give them a try if possible.

In general, it’s hard to unambiguously diagnose NaN errors since they can be due to a combination of the model architecture, the stability during training, and the quality of the data. If the other points I suggested don’t work out for you, I would recommend plotting some quality metrics in your dataset such as number of genes expressed per cell, number of cells expressing a gene, observed library size per cell, etc, and doing additional preprocessing as needed.