TCR and BCR genes and HVGs selection

pmarzano97 · March 5, 2024, 2:00pm

Hello world!
I’ve read in many papers that when performing a re-clustering of some populations, like T cells or B cells, prior to the step of integration and so on, they re-calculate the HVGs but excluding the TCR- or BCR-related genes, because they are donor-specific, especially when talking about BCR.

Can you help me how to remove the TCR- or BCR-related genes before computing the HVGs selection, but without removing them from the .var of the anndata, since I want to evaluate their expression during the step of cell annotation?

The code that I use to calculate the HVGs is the following:
sc.pp.highly_variable_genes(adata,
n_top_genes = 4000, flavor = “seurat_v3”,
layer = “raw”, batch_key = ‘sample_id’,
subset = False)

Thanks a lot!
Paolo

ivirshup · March 6, 2024, 2:35pm

I think you could do something like:

sub = adata[:, ~tcr_bcr_genes]
hvg_df = sc.pp.highly_variable_genes(sub, inplace=False)
hvg_df.index = sub.var_names  # only necessary in scanpy <1.10

Then grab the info out of the returned dataframe.

adata.var["highly_variable"] = hvg_df["highly_variable"].reindex(adata.var_names, fill_value=False)

pmarzano97 · April 2, 2024, 7:07am

Thank you so much @ivirshup!!!
Do you know if there’s a repository or a document or anything else where the BCR/TCR related genes are listed (like for the ones related to the cell cycle in the guide)?

Thank you!!

ivirshup · April 2, 2024, 8:51am

No problem.

For gene lists I don’t, but maybe @grst does?

grst · April 2, 2024, 9:41am

The Ensembl gene annotation provides this information in the “Transcript type” column which you can retrieve from Biomart. Other genome annotations should have similar annotations.

More speficially, BCR genes are those with IG_[VDJDC]_(gene|pseudogene) and TCR genes those with TR_[VDJDC]_(gene|pseudogene) transcript type.

cane11 · April 3, 2024, 1:02am

As I used it yesterday:

dataset = pybiomart.Dataset(name='hsapiens_gene_ensembl', host='http://www.ensembl.org')
table = dataset.query(attributes=['ensembl_gene_id', 'external_gene_name', 'chromosome_name', 'transcript_biotype'], use_attr_names=True)
merged = pd.merge(
    adata.var,
    table,
    left_on='gene_ids',
    right_on='ensembl_gene_id'
)


merged['transcript_biotype'].value_counts()
adata.var['gene_name'] = adata.var_names
gene_subset = pd.merge(
    adata.var,
    table[table['transcript_biotype'].isin(['protein_coding', 'lncRNA', 'IG_C_gene', 'TR_C_gene'])], # replace here with your filter.
    left_on='gene_ids',
    right_on='ensembl_gene_id',
)
gene_subset.index = gene_subset['gene_name']
gene_subset.index.value_counts()

Topic		Replies	Views
Usage of HVG in scVI scvi-tools gene-selection , scvi	12	2247	March 1, 2022
Scanpy HVG batch_key scanpy	1	1032	March 16, 2023
Selection of HVG in scVI scvi-tools scvi	3	1118	December 20, 2022
HVG selection with multiple batches scRNA-seq	3	971	July 4, 2022
MrVI input and interpretation scvi-tools	23	1231	July 31, 2024

TCR and BCR genes and HVGs selection

Related topics