Best pre-processing steps for sc.tl.ingest()?

Hi everyone,

I have two questions regarding the pre-processing for the ingest tool of scanpy.

I want to use sc.tl.ingest() to map celltype labels from a reference dataset to my “query” dataset. This will be done per cell barcode instead of per cluster like in the scanpy vignette (Integrating data using ingest and BBKNN — Scanpy documentation). Manual annotation per leiden or louvain algorithm cluster is not an option for this dataset, all annotations are cell-level based.

Should the pre-processing steps I use for my reference dataset be the same as the pbmc3k tutorial (Preprocessing and clustering 3k PBMCs — Scanpy documentation) refered to in the vignette?

My query dataset consists of two individually processed AnnData objects, pre-processed like the pbmc3k tutorial and then integrated into one AnnData object with sc.external.harmony(). Are there any additional pre-processing steps I need to do before I pass this query dataset to ingest()?

I am pretty new to data annotation so all suggestions are welcome :slight_smile:
Thanks!