Integration with scVI

terzoli · November 28, 2022, 3:57pm

Hi, I have a dataset composed of three patients and from each of them I have colon, liver and blood. I am wondering what is the best methods to apply in order to minimize the biological loss but at the same time being able to correct for batch effect.

In the “mode.SCVI.setup_anndata” command:
scvi.model.SCVI.setup_anndata(
Dataset,
layer=“counts”,
batch_key=‘’…"
)
is it better to set the batch_key equal to “tissue” (tissue of origin) or “patients” or “batch_id” with batch_id equal to each sample (colon_patient1, colon_patient2, ecc)?

Thanks,
Sara

Valentine_Svensson · November 30, 2022, 2:16am

Hi Sara,

It depends on how you want to use the learned representation. With integration, the only thing that changes is what the latent variables represent (and potential options for normalization in the DE).

I would use the learned representation to define cell types that are consistent between tissues and patients and batches. In this case, I would set batch_key to each sample. For example, this way you can identify a population of cells you can call ‘Macrophages’, with a large transcriptional program that determines this type. Then you can ask the question ‘how are colon macrophages different from liver macrophages?’ using the .differential_expression() method. If you then find a gene that is different, you can see how much variability there are between donors or batches by using .get_normalized_expression().

But another option is to say that you fundamentally think that gut macrophages and liver macrophages are so different that you want this variation to be reflected in the learned representation. Perhaps you want to find cells that have extreme ‘gut macrophage’ or ‘liver macrophage’ phenotypes by using the representation. But you think these definitions of cells should be consistent between donors. In this case you would set batch_key to 'patients'.

In practice, when I do these sorts of analyses, as an early step I would typically actually do multiple variations: no integration, integrate patients, integrate patients_tissues, integrate tissues, integrate patient_tissue_batch. And investigate qualitatively how the representation of the data changes. This way I can get an idea of what sources of variation contribute to the data.

Hope this is useful!

/Valentine

terzoli · November 30, 2022, 11:01am

Thank you very much Valentine.
Your explanation is very clear and useful.

Best,
Sara

Topic		Replies	Views
Integrating different tissues with scVI scvi-tools integration , scvi	3	149	November 15, 2024
Merging data from multiple cohorts and many donors with scVI scvi-tools	2	817	September 22, 2021
scVI integration set batch_key and poor Umap result scvi-tools integration , diff-exp , scvi	3	204	August 7, 2024
Preserving biological variability in scVI sample integration scvi-tools integration , scvi	4	778	February 16, 2024
scVI integration using two batch keys scvi-tools	5	1196	October 24, 2023

Integration with scVI

Related topics