Data scaling for cytovi

Hi guys,

In cytovi tutorial, i see a data scaling step for each batch cytovi.scale(adata_batch1) which by default scale the data to range between 0-1 using sklearn’s MinMaxScaler.

Post correction, what is the best way to “untransform” the range to match the original range? One can run transform using the fitted MinMaxScaler object, but then should we:

  1. Run transform per batch using the MinMaxScaler object fitted for that batch? Or
  2. Fit (but not transform) a new MinMaxScaler object based on data from all batches, then post correction run transform using this fitted MinMaxScaler object? Or
  3. Some other suggestion

Thanks!

In my opinion, you should save MinMaxScaler objects per batch before you continue to use the internal cytovi scale function, then you can use them to untransform per batch, post-analysis.

Of course, you can also have this kind of object for all data put together; it all depends on what you want to do following it.

@florianingelfinger, any other thoughts?

The problem with untransforming per batch, post-analysis, is that then differences between batches are re-introduced (i.e. by applying a different transformation on each batch, the batches will not look similar anymore, even if they were well aligned by cytovi).

What we are considering now, is to apply batch specific scaling, do the alignment with cytovi, and then untransform with the scaler object from one of the batches. In theory, if you suggest that can be used on the data from that specific batch, it should also be suited to untransform the others as those will look very similar after the alignment.

This avoids learning a separate scaler, and feels closer to the recommended pipeline of first applying a scaling step per batch rather than doing a single scaling across all samples.

Further thoughts are of course still welcome.