Unexpected DE test results in scvi version 1.3.0 (as compared with version 1.1.2)

Hello!

I recently upgraded scvi from version 1.1.2 to 1.3.0 and attempted to reproduce a one vs all DE test across different cell types. I got drastically different results. I noticed in the release notes that there were changes made to the implementation of the DE test and was wondering if those changes affected the results. I saw that the default mode was changed to “change” but I had been manually setting mode=“change” all along, so I don’t believe this is the source of the difference. Here is the code I used for the original results (with version 1.1.2) and the new results (with version 1.3.0) (same code for both).

de_df = model.differential_expression(
    adata=adata_filtered,
    groupby='cell_type',
    mode = "change", 
    delta = 0.25, 
    fdr_target = 0.05 
)

The original results (version 1.1.2) of the one vs all DE test seemed to be pretty accurate reflections of known marker genes across cell types. The new results don’t seem to be as accurate.

One cell type that had drastically different DE results between the two package versions, which I will use as an example here, is Tfh. In the original results, over 2,000 of the 5,000 highly variable genes in my dataset were differentially expressed at an fdr of 0.05. In the new results, only 3 genes were differentially expressed. In general, the new results show lower bayes factors and lower lfc medians. I want to clarify that the non-zeros proportions for all cell types are identical between the original results and the new results so I am confident that there was not any mixup on my end of different input data. I will attach the results for 3 marker genes that showed up as differentially expressed in the original results but that did not in the new results to show the overall trend I’m seeing.


Do you have any recommendations for what to do? Maybe going back to a previous version of scvi or changing DE test parameters in the current version? Any further insight into interpreting the updates that were made in the latest version would be greatly appreciated.

Thank you!
Julianne

I am also experiencing issues with Bayes factor and lfc in the latest version and posted about this yesterday (Problems with Bayes factors and LFC changes using model.differential_expression()). Hoping we can get an update to fix rather than rollback, but you should also be able to do that

Let me get back tomorrow with the kwargs for DE to fully reproduce the old results. Those are still an option. To figure better guidelines, could you also post the other columns of the DE table, especially mean expression in scVI and top20 genes in the old and new scVI version.

Changes are: Change mode is now the default (I don’t think this is the issue here), pseudocounts added before LFC computation are much larger (this was a typo in the function percentile instead of quantile but maybe they are to large now likely the issue for IL21 above), beforehand p-value was counted for down- and upregulated together (this is quite confusing as a highly varying signal was reported as significant and the report was about up- or downregulation).

All of these are still options (where the old two-way is hard to justify in my opinion).

Thanks for looking into this! I have attached the full DE result dataframe for the top 20 genes using the old scvi version and the new.




You can manually set the pseudocount to 1e-7 and some of the genes should be there again. It’s biased towards highly expressed now. I should rethink whether this bug (0.9 percentile instead of quantile) was to much to fix and we actually need a lower quantile.
Genes like PRG4 in the previous version were actually not really expressed but reported by the DE function.

Thanks for this suggestion. I wanted to report the results after manually setting pseudocounts=1e-7. The gene list now contains most of the same genes as the original DE test (scvi version 1.1.2). The bayes factors are just slightly lower. Attached are the new top 20 results for the Tfh cell type.