Thanks for the great framework! On occasion when I build a model for a particular cell type, the majority of the cells cluster together while some are severe outliers and clearly not of the same cell type. I can easily remove these cells, however then they are absent from the trained model, so I cannot load the model with that dataset or perform differential expression. Is there a simple way to update the trained model with the outliers excluded, without having to train a new model?
You should be able to use the model in any way regardless if you have removed some cells from the dataset.
What kind of error are you seeing?
I could still use it, but the cells, which are clearly outliers, are still included and will affect analysis performed with the model. For example, if I recompute neighbors from the latent space the projections only get worse. The model still seems to still retain information regarding the outlier cells in the latent space. My concern is this also affects the batch conditioning, normalization, and DEG. Similarly, in order to perform DE I would need to do some fancy indexing to exclude the outlier cells. Would be much easier and cleaner if it was possible to update the model.
You can pass indices to most downstream tasks or an AnnData object. If you pass your filtered objects you will get the results on only those cells. In the DE function there is an ‘importance’ weighting. Enabling this downweights the effect that these outliers will have without removing those. This downweighting is described in: https://www.pnas.org/doi/full/10.1073/pnas.2209124120