Hello world!
I’ve read in many papers that when performing a re-clustering of some populations, like T cells or B cells, prior to the step of integration and so on, they re-calculate the HVGs but excluding the TCR- or BCR-related genes, because they are donor-specific, especially when talking about BCR.
Can you help me how to remove the TCR- or BCR-related genes before computing the HVGs selection, but without removing them from the .var of the anndata, since I want to evaluate their expression during the step of cell annotation?
The code that I use to calculate the HVGs is the following:
sc.pp.highly_variable_genes(adata,
n_top_genes = 4000, flavor = “seurat_v3”,
layer = “raw”, batch_key = ‘sample_id’,
subset = False)
Thank you so much @ivirshup!!!
Do you know if there’s a repository or a document or anything else where the BCR/TCR related genes are listed (like for the ones related to the cell cycle in the guide)?
The Ensembl gene annotation provides this information in the “Transcript type” column which you can retrieve from Biomart. Other genome annotations should have similar annotations.
More speficially, BCR genes are those with IG_[VDJDC]_(gene|pseudogene) and TCR genes those with TR_[VDJDC]_(gene|pseudogene) transcript type.