How to concatenate anndata properly?

cookiemonster · November 2, 2022, 4:08am

I have 4 samples data trying to do batch effect cleaning but when I try to concatenate them it comes out like this
sc.pp.filter_genes(adata1, min_cells =1)
sc.pp.filter_genes(adata2, min_cells =1)
sc.pp.filter_genes(adata3, min_cells =1)
sc.pp.filter_genes(adata4, min_cells =1)
adata = adata1.concatenate(adata2, adata3, adata4, batch_key='Sample')
adata
AnnData object with n_obs × n_vars = 55320 × 20678
obs: 'Sample'
var: 'gene_ids', 'feature_types', 'genome', 'n_cells-0', 'n_cells-1', 'n_cells-2', 'n_cells-3'

the n_cells comes out all separate so should I be concatenating the dataset before doing filtration/doublet removal/etc?

PauBadiaM · November 2, 2022, 9:12am

Hi @cookiemonster,

As a general rule, you should always perform quality control (QC) for each sample individually first. Each sample can be quite different from the rest, requiring different filtering thresholds, for example at the doublet score level. Once you have performed QC at the sample level, you can merge the samples into a single object using the concatenate method. By the way, I would recommend to add join='outer' to the concatenation because otherwise you might lose quite some genes (by default is set to inner).

Regarding the n_cells problem, is this related to your previous question?

If yes, it is very weird that you store this information into the var attribute of your adata object, since there it should only store metadata for your features (in this case genes). If what you want is to obtain the number of cells per sample and cell type, you can do it after concatenating your samples instead of doing it before:

adata.obs[["Sample", "louvain"]].value_counts().reset_index()

Since this results in a dataframe that its dimensions are not number of cells x samples (obs) nor number of cells x genes (var), if you want to store it you could do it in the uns attribute:

adata.uns['n_cells'] = adata.obs[["Sample", "louvain"]].value_counts().reset_index()

Hope this is helpful!

ivirshup · November 3, 2022, 8:13pm

Stronly agree with @PauBadiaM about doing QC (especially doublet detection) per sample first – assuming your inputs are divided like this to start.

But I would prefer anndata.concat is over AnnData.concatenate. With anndata.concat, you would not get the n_cells-{n} columns. Exact behavior here is defined by the merge argument).

Topic		Replies	Views
How to filter concatenated anndata object? Help	5	572	March 18, 2024
scVI integration with all genes scvi-tools integration , scvi	0	289	December 5, 2023
Anndata.concatenate() with two 10x multiome datasets? anndata integration , multivi	2	647	December 29, 2022
Help concatenating var for cite seq scanpy	2	627	May 24, 2023
Concatenate anndata with merged rows anndata	0	410	August 5, 2022

How to concatenate anndata properly?

Related topics