scVI integration with all genes

martibonomi · December 5, 2023, 7:07pm

Hello!

I would like to integrate all my samples using all the genes instead of only the top 2000 most variable genes.

I would do this in order to better characterise the variability and differences between my batches, and to subsequently run the differential gene expression on my cells using all the genes.

However, not all the samples have the same number of genes, so that I would do the following:


anndata_dir = 'C:/Users/Martina/Desktop/AnnData'
list_files = os.listdir(anndata_dir)
anndata_list = []

for filename in list_files:
    file_path = os.path.join(anndata_dir, filename)
    anndata_obj = ad.read_h5ad(file_path)
    anndata_list.append(anndata_obj)

concatenated_anndata = ad.concat(anndata_list, axis=0, join='outer')

In this way, the cells from batches that do not express certain genes are added a column for the corresponding genes with zero counts.

Would you recommend doing this (integrating samples using all the genes)? Does the model then effectively remove batch effects and correct the counts? Would I get good DGE results? Or do you recommend a different approach? If so, what would you recommend?

The only thing that makes me doubt of this is for the DGE results: please correct me if I’m wrong, but by concatenating all the sample with the ad.concat function with the join='outer' setting, I add zero counts on genes for those cells for which I do not have information about that same gene. I would then think that DGE results would be biased as I assigned zero expression for that genes on those cells: the fact that I miss info for those genes do not mean that these cells do not express the gene.

What is your opinion on this?

Thank you a lot for your help!!

Topic		Replies	Views
Suggestion on parameters for training scvi model scvi-tools integration , scvi	3	1746	December 4, 2023
Workflow to integrate dataset from two different species scvi-tools	1	104	January 21, 2025
How to concatenate anndata properly? anndata scrna-seq , integration , scvi	2	8367	November 3, 2022
Shared cell types not mixing when integrating datasets from different species scvi-tools integration , scvi	4	60	June 19, 2025
HVG selection with multiple batches scRNA-seq	3	989	July 4, 2022

scVI integration with all genes

Related topics