I recently upgraded scvi from version 1.1.2 to 1.3.0 and attempted to reproduce a one vs all DE test across different cell types. I got drastically different results. I noticed in the release notes that there were changes made to the implementation of the DE test and was wondering if those changes affected the results. I saw that the default mode was changed to “change” but I had been manually setting mode=“change” all along, so I don’t believe this is the source of the difference. Here is the code I used for the original results (with version 1.1.2) and the new results (with version 1.3.0) (same code for both).
The original results (version 1.1.2) of the one vs all DE test seemed to be pretty accurate reflections of known marker genes across cell types. The new results don’t seem to be as accurate.
One cell type that had drastically different DE results between the two package versions, which I will use as an example here, is Tfh. In the original results, over 2,000 of the 5,000 highly variable genes in my dataset were differentially expressed at an fdr of 0.05. In the new results, only 3 genes were differentially expressed. In general, the new results show lower bayes factors and lower lfc medians. I want to clarify that the non-zeros proportions for all cell types are identical between the original results and the new results so I am confident that there was not any mixup on my end of different input data. I will attach the results for 3 marker genes that showed up as differentially expressed in the original results but that did not in the new results to show the overall trend I’m seeing.
Do you have any recommendations for what to do? Maybe going back to a previous version of scvi or changing DE test parameters in the current version? Any further insight into interpreting the updates that were made in the latest version would be greatly appreciated.
Let me get back tomorrow with the kwargs for DE to fully reproduce the old results. Those are still an option. To figure better guidelines, could you also post the other columns of the DE table, especially mean expression in scVI and top20 genes in the old and new scVI version.
Changes are: Change mode is now the default (I don’t think this is the issue here), pseudocounts added before LFC computation are much larger (this was a typo in the function percentile instead of quantile but maybe they are to large now likely the issue for IL21 above), beforehand p-value was counted for down- and upregulated together (this is quite confusing as a highly varying signal was reported as significant and the report was about up- or downregulation).
All of these are still options (where the old two-way is hard to justify in my opinion).
You can manually set the pseudocount to 1e-7 and some of the genes should be there again. It’s biased towards highly expressed now. I should rethink whether this bug (0.9 percentile instead of quantile) was to much to fix and we actually need a lower quantile.
Genes like PRG4 in the previous version were actually not really expressed but reported by the DE function.
Thanks for this suggestion. I wanted to report the results after manually setting pseudocounts=1e-7. The gene list now contains most of the same genes as the original DE test (scvi version 1.1.2). The bayes factors are just slightly lower. Attached are the new top 20 results for the Tfh cell type.
Hi there, wanted to follow up here! Adjusting the pseudocounts did result in a DE genes list that I expected to see biologically but did not reproduce the original results. If possible, it would be great if you’re able to share recommended guidelines for how to fully reproduce the old results when working with the new DE test updates. Thank you.
I am facing similar issues with the new scVI version. Except in my case the situation is even more dire. All the significant results are gone! I have tried many permutations of the kwargs including setting pseudocounts to zero, nothing seems to work.
Yes I have. It didn’t work either.
By the way, I think that mode “change” may have been default since before. Because when I ran it with “vanilla” it didn’t give me the attribute termed “proba_not_de” (it used a different name for a second probability attribute), which I always used to get even earlier, without specifying mode.
Hi! I also recently upgraded scvi from version 1.2.0 to version 1.3.1 and have been seeing some of the same changes already mentioned here such as fewer total number of DE genes, lower bayes factor, and lower lfc magnitudes. I tried the suggestion of adjusting the pseudocounts to 1e-7. I noticed a brief mention of updates to the new test in the scvi release notes but have not found further information in any of the documentation pages about how the new test works, how I should be running it, and how I should be interpreting the results differently than in the older version. Any help would be appreciated.
We are looking into the DE discrepancies we have in the current scvi-tools release vs the previous ones.
In the meantime, Can you both try to run DE with test_mode=“two”, mode=“change” and pseudocounts=0 ?
Another option is to run with mode=“vanilla” and pseudocounts=0
Thanks for your suggestions. I ran the same DE test four times with different scvi versions and different values of pseudocounts and mode. The tests I ran were as follows:
scvi version 1.2.0 with mode=“change”
scvi version 1.3.1 with mode=“change” and pseudocounts=1e-7
scvi version 1.3.2 with mode=“change”, test_mode=“two” and pseudocounts=0
scvi version 1.3.2 with mode=“vanilla” and pseudocounts=0
The DE test with version 1.3.2 where mode=“vanilla” had no column with LFC information in the results.
The results of each test are summarized in plots. The volcano plots show the lfc_median and bayes_factors for each gene. Points in red are genes for which is_de_fdr_0.05=True. The histograms are showing the distribution of the bayes factor values.
Thanks for this information! I just ran a DE test with scvi version 1.3.3 and the results seem more reasonable to me in terms of having a positive bayes factor for significant DE genes.
The fix was in the bayes_factors sign for test_mode=” three” (the default for mode=“change”), so like you see, significant LFC will have positive bayes factor, plus the estimation of pseudocounts (default None, which will make it estimated using the non significant gene in order to correctly estimate the LFC, in mode “change”) offsets was made smaller thus smaller subset of genes will regard as back noise, but it can still give inflated LFC values.
by default, the newest scvi-tools version runs the DE with mode “vanilla” and pseudocounts None. We plan to make the default mode “change” once we are sure everything works.