DE analysis with model.SCVI: which lfc indicates gene up-/down-regulation?

Hello!

Thank you for the wonderful scVI-tools! I am new to scRNA-seq data analysis. Briefly, I used scanpy to concatenate four samples (two replicates from two conditions each, all 10X datasets), performed preprocessing, trained scVI model, computed neighbors with scVI’s latent representation of the data, then computed UMAP and clustered with leiden method. Subsequently, I computed differential expression - a) between the two conditions (global), as well as b) between the cells from the two conditions for each leiden cluster.

I have the following questions regarding scvi.model.SCVI.differential_expression() output:

  1. The output provided the following logFoldChange columns: lfc_mean, lfc_median, lfc_std, lfc_min, lfc_max.
    For the genes that are DE (is_de_fdr_0.05=True), how can I tell if the gene expression goes higher or lower? Should I just use “lfc_mean” for this purpose?

  2. Almost always lfc_min is negative and lfc_max is positive, whereas the lfc_mean may be positive or negative. So the gene expression is lower in some cells (of a condition, for example), and higher in others? And the lfc_mean is providing the direction of change averaged over all the cells? Do I understand it correctly?

  3. Sometimes the “raw_mean1”, “raw_mean2”, “raw_normalized_mean1”, and “raw_normalized_mean2” are all zero, yet the genes are DE nonetheless. Is that okay?

You can find a snippet of the output with selected columns below.

Thank you,
Sam

                      lfc_mean  lfc_median   lfc_std    lfc_min    lfc_max  raw_mean1  raw_mean2  raw_normalized_mean1  raw_normalized_mean2  is_de_fdr_0.05
gene                                                                                                                                                        
si:dkey-208b23.5      0.479600    0.439014  6.084463 -14.936892  15.690244   0.000000   0.000000              0.000000              0.000000            True
ENSDARG00000024602    0.679054    0.649739  7.324370 -18.278358  17.349012   0.000000   0.000000              0.000000              0.000000            True
drc1                  0.540901    0.562465  6.162925 -17.544102  15.701396   0.000000   0.000000              0.000000              0.000000            True
esrp1                 0.558089    0.535105  6.800211 -18.745285  16.212980   0.000000   0.000000              0.000000              0.000000            True
gfi1ab                0.561095    0.544334  6.464262 -19.063986  15.834004   0.000000   0.000000              0.000000              0.000000            True
zgc:103559            0.497405    0.481609  5.919106 -16.137840  14.522301   0.000000   0.000000              0.000000              0.000000            True
napsa                 0.440470    0.389575  6.322702 -16.612000  14.962067   0.000000   0.000000              0.000000              0.000000            True
myzap                 0.483889    0.502490  5.984181 -16.628532  14.994525   0.000000   0.000000              0.000000              0.000000            True
CR354435.1            0.520383    0.471522  6.401921 -14.583641  15.076799   0.000000   0.000000              0.000000              0.000000            True
dnah12                0.454225    0.424026  5.315034 -13.046480  13.362019   0.000000   0.000000              0.000000              0.000000            True
fut9b                 0.456071    0.466859  6.132898 -17.295963  15.211704   0.000000   0.000000              0.000000              0.000000            True
si:ch211-113a14.22-1  0.463434    0.436661  5.820310 -13.541477  13.315475   0.000000   0.000000              0.000000              0.000000            True
gpr184                0.514271    0.445153  4.942262 -11.320700  12.297648   0.000000   0.000000              0.000000              0.000000            True
cyp4v8                0.279370    0.257111  5.054646 -13.266130  12.379461   0.000000   0.000000              0.000000              0.000000            True
si:ch1073-228h2.2     0.470916    0.417501  6.080115 -14.715637  14.257307   0.000000   0.000000              0.000000              0.000000            True
rgs13                 0.548236    0.522250  6.027261 -18.523365  14.883591   0.000000   0.000000              0.000000              0.000000            True
si:dkeyp-73d8.6       0.471345    0.417540  5.409026 -13.083514  13.284245   0.000000   0.006431              0.000000              0.019727            True
zgc:64051             0.522500    0.487912  5.545477 -15.674596  13.653400   0.000000   0.000000              0.000000              0.000000            True
srd5a2b               0.524634    0.413060  5.227385 -13.321875  12.613957   0.003236   0.003215              0.027218              0.037259            True
si:dkey-192g7.3       0.387664    0.402621  4.858399 -12.267364  11.662878   0.000000   0.006431              0.000000              0.053546            True
tent5ab               0.634892    0.472449  6.047723 -13.824921  13.967886   0.000000   0.000000              0.000000              0.000000            True
gpd1a                 0.591813    0.543583  5.842104 -14.230432  14.385897   0.000000   0.000000              0.000000              0.000000            True
CU984600.2            0.505643    0.457640  5.578200 -13.251217  13.337477   0.000000   0.000000              0.000000              0.000000            True
dbh                   0.651233    0.666539  4.201104 -11.775505  12.334496   0.223301   0.115756              1.375563              0.451738            True
ano6                  0.523729    0.489827  5.218942 -14.128026  13.381668   0.000000   0.000000              0.000000              0.000000            True
zgc:162331            0.432170    0.369040  4.980867 -11.879374  12.364733   0.000000   0.000000              0.000000              0.000000            True
oip5                  0.330404    0.330091  4.209458 -10.174562  10.167267   0.000000   0.000000              0.000000              0.000000            True
ctrb1                -0.249836   -0.376842  4.786657 -13.227513  29.091158   0.009709   0.000000              0.669568              0.000000            True
mc3r                  0.409198    0.240824  5.219241 -12.307832  12.104459   0.000000   0.006431              0.000000              0.042580            True
ebf3b                 0.559580    0.609398  3.631861 -10.932887   9.725893   0.000000   0.000000              0.000000              0.000000            True
tmx3a                 0.322164    0.324352  3.622740  -8.779639   8.504380   0.003236   0.000000              0.024187              0.000000            True
si:dkey-101k6.5       0.314722    0.264050  4.238637  -9.610153   9.698996   0.000000   0.000000              0.000000              0.000000            True
c16h2orf66            0.542751    0.535967  3.325066  -8.829274   9.562749   0.000000   0.003215              0.000000              0.021196            True
CABZ01057928.1        0.310531    0.245671  4.498628 -10.107723   9.703278   0.000000   0.006431              0.000000              0.032090            True
ENSDARG00000075540    0.398474    0.312719  4.399720 -10.870052  11.148718   0.000000   0.000000              0.000000              0.000000            True
sh2d4ba               0.240636    0.170192  3.310970  -9.403920  10.003074   0.012945   0.000000              0.090937              0.000000            True
poc1b                 0.642688    0.513015  5.234146 -11.732223  12.384102   0.000000   0.003215              0.000000              0.206118            True
si:dkey-57n24.6       0.218827    0.166036  3.520045 -10.438939   9.694427   0.000000   0.003215              0.000000              0.027960            True
tbx2a                 0.679166    0.644877  3.437314  -8.924354   9.783237   0.000000   0.000000              0.000000              0.000000            True
colec12               0.290175    0.205803  3.577260  -9.736406  20.785942   0.003236   0.000000              0.022090              0.000000            True
si:ch211-284k5.2      0.199817    0.094964  3.240186 -10.024623  10.127933   0.009709   0.012862              0.081583              0.117200            True
col6a3-1             -0.412441   -0.498374  3.123330  -9.520643  13.368073   0.000000   0.000000              0.000000              0.000000            True
zgc:113054            0.256077    0.233528  3.476520 -10.191195   9.285287   0.003236   0.000000              0.028264              0.000000            True
urp1                 -0.115161   -0.199985  4.026823 -12.646875  24.443443   0.339806   0.009646             16.555485              0.373933            True
calml4a               0.233648    0.158408  3.125443  -9.542439   9.866495   0.000000   0.003215              0.000000              0.047707            True
CR848032.2           -0.457329   -0.593560  3.813791 -11.415813  25.767803   0.006472   0.012862              0.120338              0.678105            True
cass4                 0.341258    0.263085  3.951136  -9.347710   9.992567   0.000000   0.000000              0.000000              0.000000            True

@PierreBoyeau would be the best person to assist!

1 Like

Hi Sam,

scVI uses a Bayesian framework for analysis. Very briefly, what this means is that based on data, an entity such as fold change has a range of possible values. Abstractly, there is a conditional probability distribution P([\text{Fold change}] \, |\, [\text{Data}]), which scVI estimates through various sampling strategies.

The lfc_min and lfc_max are the smallest and largest samples of log fold changes that were observed during the sampling process. They reflect samples from the distribution P([\text{Fold change}] \, |\, [\text{Data}]), and not observed cells or data.

The columns lfc_mean and lfc_median reports different measures of ‘most typical’ values of the distribution P([\text{Fold change}] \, |\, [\text{Data}]). Unfortunately it is somewhat ambiguous which notion of ‘average’ is most appropriate.

Personally, I prefer median over mean, but I don’t have a clear motivation for why.

Regarding direction, the columns comparison, group1, and group2 indicates what is being compared to what. The fold changes are always in the form of group1 / group2. So a positive log fold change means that expression is higher in group1 than group2, and a negative fold change means that expression is higher in group2 than in group1.

Hope this is helpful!
/Valentine

2 Likes