SCANVI Model Loading Issue: Partial Data and Workaround

SCANVI Model Loading Issue: Partial Data and Workaround

Hello, I’m experiencing an issue when trying to load a pre-trained SCANVI model with partial/incomplete data. The data I used is fully labeled.

The model is trained/initiated with the data fully labeled, with all cell types present, and then saved. But when I try to load it with partial data (some cell types missing, but fully labeled), I get a size mismatch error.

As a workaround, I noticed that if I add a single placeholder value labeled as the unlabeled_category used for the SCANVI model in the partial dataset, the model loads without issues and can be trained.
The placeholder value is simply the last row of the dataset duplicated with the cell type changed to the unlabeled category.

So I have these scenarios when I load the model:

  • All the cell types in the data → works
  • Partial cell types in the data → doesn’t work
  • Partial cell types, but with a placeholder row with category unlabeled_category → works

Is there a better way to load a SCANVI model with partial cell type data when unlabeled data is missing, or is ensuring all cell types are present the only proper solution?