Differential expression analysis

Hi,

I have integrated multiple sc datasets using scvi. Now I’m working on annotating the data. The first step of manual annotation is to conduct differential expression analysis between cluster. It seems that SCVI.differential_expression by default conducts differential expression analysis on the original expression matrix (X), though the clusters are assigned according to the latent space embedding. I wonder if this makes sense since the batch effect that lies in X hasn’t been removed. Should I instead force SCVI.differential_expression to conduct differential expression analysis on the corrected expression matrix (Expected
frequency fw(zn, sn) in the paper)?

Thanks,
Yuqi

1 Like

Hi, thank you for your question. SCVI uses the batch-corrected normalized expression values (denoted as \rho_n in the manuscript) for computing differentially expressed genes. You can see the relevant code portion here

1 Like

I had no idea scanpy had differential expression. That would be awesome if I never had to leave Python to get differential expression.

What method does it use for the algorithm?

Is it possible to get a batch-adjusted gene-wise matrix for manual exploration? I read the tutorial titled Integrating Datasets with scVI in R, but I could not see how to get the genes by cells batch-corrected values matrix, but only how to do the subsequent step of differential expression analysis. The tutorial also does not explicitly state whether batch-corrected or uncorrected values are being used. Based on the wording in the tutorial, I have the same misunderstanding as Yuqi and I think that the batch correction is only used for assigning cells to clusters. It would be worthwhile to state it explicitly to avoid doubt. I am interested to see the gene values for genes on chrY for women, who do not have that chromosome and hence have zero counts in the Cell Ranger output matrix for women for the genes on chrY. scVI should preserve such basic biological differences that are easily available from patients’ clinical data and introductory biology textbooks, but I am unsure about how to check this.

Hey @Dario

As @martinkim0 previously mentioned, SCVI uses the batch-corrected normalized expression values for performing DE.

Sorry It might be misunderstood from tutorials but we can infer that with the fact that we use the trained model to perform the actual DE with the use of the function get_normalized_expression (see code), which is the one you need in order to get your genes by cells batch-corrected values matrix, also in R.
just use: model$get_normalized_expression()

1 Like