I’m trying to map some human samples to a Tabula Sapiens reference, and have trained an SCVI model, but our internal data are missing some of the vars / genes in the trained Tabula Sapiens reference:
ValueError: Number of vars in adata_target not the same as source.
Is the right approach here to intersect the Tabula Sapiens genes with those seen in our data and train on that subset? Or is it OK to set the counts of missing genes in our data to 0? (I see that described for the TOTALVI model, but not SCVI/scANVI.)