Comparing steps of Scanpy for scRNQ-seq and totalvi for CITE-seq

taopeng1100 · August 2, 2021, 6:44pm

Hi I have spent a few days to learn how totalvi analyze CITE-seq data and I am bit confused by the contrasting steps between Scanpy and totalvi:

I have used Scanpy for 10x scRNA-seq for over 2 years now and I love it.
The typical steps are as follows:

Read in the data.
Preprocessing.
3.normalize, log transformation.
Generate highly variable genes.
regress out, scale, PCA, neighbor , UMAP and Leiden.

For TOTALVI, I realize it is quite different:

Read in the data.
Preprocessing
normalize, log transformation.
Generate highly variable genes. The method in this step is very different! Why?
setup_anndata
vae = scvi.model.TOTALVI, vae.train. This is unique to scvi/TOTALVI.

Please, advise if this is appropriate steps to analyze CITE-seq using scvi/TOALVI?

adamgayoso · August 2, 2021, 7:56pm

totalVI replaces the regress out, scale, and PCA steps of Scanpy. How you select genes is up to you (we use the seurat v3 flavor of the function in scanpy, which requires the counts and not the normalized data). All the other differences you see are there to ensure that the anndata object has data in the correct slots.

taopeng1100 · August 2, 2021, 10:09pm

Thx for the explanation. I have CITE-seq data from donor 1 to 3 and each donor has control or treated. How can I set up batch so I can see the clustering result in donor 1, 2 and 3 or control versus treated?

Valentine_Svensson · August 4, 2021, 1:28am

I would set this up as six batches of [donor] x [treatment condition]. This means the latent representation will describe cell-to-cell variation not explained due to donor-to-donor variation or differences due to treatment. This way you can cluster your data and define cell types. Once you have annotated the cells into e.g. 10 cell types you can do differential expression per cell type between treated and control.

This is a very effective way to learn how various types of cells in a context react differently to a perturbation and in what way they may be communicating with each other by investigating the DE genes.

mdmanurung · September 21, 2021, 11:50am

Hi @Valentine_Svensson,

I am also interested to do DE analysis, but I am not sure which values and methods to use. So far, I have exported raw gene counts and denoised protein values to R for DE analysis using edgeR and limma, respectively, on pseudobulked samples.

Does this approach make sense?

Valentine_Svensson · September 29, 2021, 12:25am

What I do depends on how complex the design of the experiment is. In straightforward settings I use scVI. If there is nested or hierarchical design I pseudobulk each sample and use GLMMs.

I don’t think you should do statistics on denoised values (without incorporating uncertainty about the denoising). The confidence/credible intervals will no longer be related to your observed data.

taopeng1100 · October 8, 2021, 3:28pm

For this step: sc.pp.highly_variable_genes, I used to do like this:
sc.pp.highly_variable_genes(adata, min_mean=0.0125, max_mean=3, min_disp=0.5)

What are the best practice to select highly_variable_genes?

I appreciate your insights!

Tao

Topic		Replies	Views
All genes or highly variable genes? scvi-tools gene-selection , scvi , totalvi	10	4073	March 31, 2022
totalVI workflow scvi-tools totalvi	12	762	August 1, 2021
Running TOTALVI data in which subset of cells do not have citeseq data scvi-tools integration , totalvi	8	668	March 25, 2021
Preparing data for totalVI scvi-tools totalvi , preprocessing	11	1341	July 30, 2021
TOTALVI RNA/protein analysis for R users scvi-tools	5	618	April 9, 2021

Comparing steps of Scanpy for scRNQ-seq and totalvi for CITE-seq

Related topics