First, does n_samples refer to the number of cells (surely it can’t be number of biological samples, as the example dataset only has 2 individuals?), and if yes, why is the default 1 / why is the suggested number in the tutorial 25 / what might be a recommended number to set this as?
n_samples refers to Monte Carlo sampling for each cell. The normalized expression is a random variable, and we return the average over 25 samples in this case. It’s an unbiased estimate of the expectation, but you need a LOT more samples to reduce the variance of this estimate. So empirically, 25 just seemed to work well.
Generally, you would take all the batches, but you could have the case where one cell type is only seen in one batch. In this case, you’d want to call the function separately for that cell type, and not use the transform batch param in that case.
The method returns both, you can ignore the RNA denoised expression. It wouldn’t save any time really to reimplement in a way that only returns protein, so feel free to just ignore the RNA part for now.
I see. The problem I’m running into is actually that I keep running out of memory, and was wondering whether returning only the protein expression might reduce the memory? If not I suppose then I don’t have options other than increase memory or forgo the denoised expression values?