[Usage clarification] Should the .obs in query and reference be exactly the same?

yojetsharma · September 13, 2024, 7:37pm

I have a query (iPSC derived precursors snRNA, merged from 1-healthy and 2-patients) and a reference (human brain scRNA dataset). Query dataset contains ‘sample’ and ‘leiden’ and reference contains ‘sample’ and ‘cell_type’. I rename the Leiden column in the query to cell_type to match the reference. Following gene intersection and concantation, I have run the SCVI tutorial with default parameters. However, my query never overlaps with reference. So, is it because the ‘leiden’ or cell_types labels in my query (which are essentially cluster numbers) and reference are different?

cane11 · September 13, 2024, 8:13pm

In scVI the content of these columns doesn’t matter (for scANVI it will seperate based on the cell-type column). However, you use two different celltypes iPSC and brain and two different technologies scRNA/snRNA. The second one is already hard to integrate and the first one also sounds like strong differences in gene expression. If you really want to integrate both datasets something like Seurat rPCA/CCA integration might be more effective.

yojetsharma · September 13, 2024, 8:43pm

In the manuscript from Truetlin and Theis lab I came across the following in their methods section:

We compared the data integration performance across the following latent representations of the data: unintegrated PCA, RSS(default parameters except for using 2 layers, latent space of size 30 and negative binomial likelihood) integration, scANVI(default parameters) integrations using either snapseed level 1, 2 or 3 annotation as cell type label input, scPoli(parameters shown above) integrations using either snapseed level 1, 2 or 3 annotation or all three annotation levels at once as cell type label input, scPoli36 integrations of meta-cells aggregated with the aggrecell algorithm (first employed as “pseudocell” using either snapseed level 1 or 3 annotation as cell type label input to scPoli. We used the following scores for determining integration.

But I’m also interested to know how the scRNA and snRNA integration would be difficult? If so, using a human brain reference where snRNA is used will be more appropriate?

Topic		Replies	Views
Trouble integrating my dataset into a reference dataset scvi-tools integration	16	394	October 11, 2024
What model to use when integrating batches of scRNA-seq matrices containing >150,000 T and innate lymphoid cell (ILC) sub-populations scvi-tools scvi	7	658	May 26, 2022
SCANVI: Label transfer from adult to embryonic data? scvi-tools scanvi	1	449	May 17, 2022
Differential expression between datasets scvi-tools diff-exp	4	1113	May 20, 2021
scRNA data integration General integration	0	351	April 6, 2023

[Usage clarification] Should the .obs in query and reference be exactly the same?

Related topics