Differential expression between datasets

grst · May 7, 2021, 12:38pm

Hi all,

I was wondering if scVI’s differential expression module still works in the following case:

I integrated two datasets with scANVI:

dataset1: healthy lung tissue, mixture of immune, stromal, endothelial and epithelial cells, 10x
dataset2: lung cancer tissue, also a mixture of the same cell types, Smart-seq2

Judging from the UMAP plot, the integration worked reasonably well.

I would like to compare for each cell type healthy vs. tumor samples.

With any classical DE method (i.e. a linear model), I’d expect the results of this comparison
to mainly reflect the technical differences between the 10x and the Smart-seq2 platforms. Since scANVI appears to successfully reduce the batch effects in the latent representation, I was wondering if such comparisons between datasets from different platforms become feasible as well.

Best regards,
Gregor

Valentine_Svensson · May 7, 2021, 5:15pm

As you suspect, you will have the same issues as a linear model of not being able to know whether the fold changes you find are due technical differences or disease status.

To do this kind of differential expression, where you have confounding between data source and condition, the easiest solution is to find more data sources. Then you can treat each data source as a replicate.

If you have three independent healthy lung data sets and three independent lung cancer data sets, you can integrate these to harmonize the cell type annotations, then do within-cell-type differential expression between cancer vs healthy across the six total data sets.

grst · May 10, 2021, 8:59am

Hi Valentine,

thanks for your quick reply and for your suggestion!
To make sure I got it right: By “treating the datasets as replicates” you mean

integrating them using scVI while specifying dataset as batch variable and then
simply use the SCVI.differential_expression() for the comparison?

Or are you rather referring to “pseudo-bulk” comparisons, such that every dataset becomes a sample?

Best,
Gregor

Valentine_Svensson · May 20, 2021, 5:39am

At the moment scVI can’t deal with hierarchical samples (I think?). I use generalized linear mixed models, but pseudo-bulk should also work. You can split the total data by the (harmonized) cell type annotation and data-set.

grst · May 20, 2021, 7:25am

Thanks!

For future reference: I found this article really helpful in explaining the point of using mixed effects models for single-cell DE analyses.

Topic		Replies	Views
Differential expression analysis scvi-tools	4	754	January 5, 2025
How to find DE genes between different datasets? scanpy integration	1	320	May 22, 2024
Inquiry about Data Input and DE Analysis Details in scVI scvi-tools diff-exp , scvi	4	268	May 3, 2024
DE analysis between two batch-specific clusters scvi-tools diff-exp , scvi	9	732	March 24, 2023
Differential Expression and Batch Correction scvi-tools scvi	1	156	February 20, 2025

Differential expression between datasets

Related topics