Model which can work on binary data


I have binary data (0/1) and not continuous expression data.
do you think its possible to use the scvi-tools models on my data?

Thank you.


Kind of, but if you try it, I would highly recommend doing some statistical investigation. The scvi-tools models are modeling count data (they actually don’t model continuous expression data).

Briefly, how count distributions ‘work’ is that they arise from adding together binary outcomes (e.g., if you have 7 ‘successes’ and 3 ‘failures’ you have a count of 7, where in general you include information about the fact that you looked at 10 cases in total). There are some variations on this that include variation on top of the counting procedure (the default ZINB distribution in scvi-tools for scRNA-seq models this counting procedure + differences in efficiencies per observation + counts missing at random).

With this in mind, a count of 1 is a count, which differ from a 0 count. You would model the binary case with a Bernoulli distribution. The first step of accumulating binary data is to move to the binomial distribution, where you say “I have x positive cases out of n trials”. But you can set n = 1 and you are back at the Bernoulli case.

What I would do, is I’d try the models in there, with the likelihood parameter set to ‘poisson’, as a pilot and see if I get anything.

If results are promising, I would use the scvi-tools skeleton (GitHub - scverse/scvi-tools-skeleton: Template repository for creating novel models with scvi-tools) and basically copy in the components of the model you are using, but in the loss function in (line 177) replace ZeroInflatedNegativeBinomial with torch.distributions.Bernoulli (Probability distributions - torch.distributions — PyTorch 1.13 documentation).

Hope this helps!