Scvi-tools: different DEGs when comparing same data

Hello everybody,

I am new to single cells and differential analysis, so I am sorry if this question may sound stupid. However, I used scvi-tools to perform a differential expression analysis on the same cluster but across cells affected by two different conditions. After setting the seed using scvi.settings.seed = 42, my code looks like:

    idx1 = (adata.obs['leiden'] == 0) & (adata.obs['Response'] == 'No')
    idx2 = (adata.obs['leiden'] == 0) & (adata.obs['Response'] == 'Yes')
    scvi_de_noresp_vs_resp = model.differential_expression(idx1=idx1, idx2=idx2, batch_correction=True)

Then, I focused only on genes for which is_de_fdr_0.05' is True (referred to as “significant genes” from now on) . Now, the thing is: if I redo the same analysis, on the same exact data and with the same exact seed, just inverting the order of the idxs, such as:

    idx1 = (adata.obs['leiden'] == 0) & (adata.obs['Response'] == 'Yes')
    idx2 = (adata.obs['leiden'] == 0) & (adata.obs['Response'] == 'No')
    scvi_de_resp_vs_noresp = model.differential_expression(idx1=idx1, idx2=idx2, batch_correction=True)

I obtain a different list of significant genes. This is a bit surprising to me, because, to my understanding, the log fold change should be simmetrical, so ideally the abs value of the log fold change should be the same, just in opposite direction…So, I would expect to find the same list of significant genes, but this is not the case (although most of the significant genes overlap between the two analysis).

Is this behaviour expected? And if yes, why?

Hi, DEG in scvi-tools are computed by random sampling from the underlying distribution. Fixing the random seed yields the same result for the same computation. However switching indices yields slightly different results (similar to another seed). Especially for very lowly expressed genes our estimates can be noisy and I would recommend manually removing them (maximum across both groups of estimated scale smaller than 1e-5 should be fine to filter).

Thank you very much for this quick answer, this is extremely helpful. So, in theory, it would be correct to expect the same results in both condition ( IImean, that the log fold change should be symmetric), right?