I’m using scArches to compare gene expression profiles between disease and normal samples. My query dataset contains cells from stem cells of healthy individuals and patients, labeled as WT
, LSP2
, and LSP3
. The reference dataset includes samples labeled Sample-1
, Sample-2
, and Sample-3
, derived from healthy brains.
The issue is that the categorical labels in my query and reference datasets do not directly match, and they represent fundamentally different conditions.
How should I handle and map these non-matching categorical labels to ensure compatibility for analysis in scArches?
Specifically:
- Should I create a mapping that reflects the biological context of each sample?
- Are there best practices for aligning categories when they represent different conditions or experimental setups?