Hello, first of all, thank you for great tools and environment.
I have a question about data integration.
Now I have human Pancreatic datasets, Baron, Muraro, Segerstolpe, and Xin (it was from scRNA-Seq Datasets), what I want to do is integrating 3 of them, and use it as training data to predict the left one’s cell type with scANVI, and another semi-supervised cell type annotation tool (scNym).
But I have trouble to do that right now, I converted datasets to Anndata, but datasets have some common cell type, but some are not common, and their features have same problem too. Even in some dataset, features have same name with number like A1BG, A1BG_2.
For now, I use only common features in 4 datasets, and predict them with integrating some cell types, but in setting, training dataset and test dataset don’t have perfect match of label. (And I integrated same name features in a dataset by averaging them to have more common features)
Is there right way or guide to deal with these kind of integration?