TotalVI not fully integrating CITEseq cells with GEX and missing protein values

timslittle · February 6, 2025, 12:04pm

Hi Team,

Apologies if I’m doing something wrong here!

I have a CITEseq dataset in which some cells have good quality RNAseq values but poor quality ADT, so these ADT values have been removed. I tried using TotalVI to integrate (with batch effects independent of the missing ADT values) which worked brilliantly in every regard except that the cells with missing ADT values were clustered separately from the other cells with ADT present.

The tutorial shows successful integration of missing ADT data cells, however in this example all cells with missing ADT are part of one batch. I tested the tutorial again but removing ADT values from the same number of random cells:

#Remove ADT from random cells

random_indices = np.random.permutation(adata.shape[0])
random_indices = random_indices[range(0,adata.obsm["protein_expression"][batch == "PBMC5k"].shape[0])]
adata.obsm["protein_expression"].iloc[random_indices] = np.zeros_like(
    adata.obsm["protein_expression"].iloc[random_indices]
)
adata.obs['prot_data'] = [ True ]*adata.n_obs
adata.obs['prot_data'].iloc[random_indices] = False
# Rest of code pretty much the same...

perm_inds = np.random.permutation(len(adata))
sc.pl.umap(
    adata[perm_inds],
    color=[TOTALVI_CLUSTERS_KEY, "batch", "prot_data"],
    ncols=1,
    frameon=False,
)

Is this expected behaviour? I wouldn’t consider this to be integrated, but maybe it is unavoidable if the cells without ADT are not also treated as a batch needing correction? If the latter is true, how best would it be to approach this confounder?

Thanks for the great tool!
Tim

cane11 · February 6, 2025, 4:38pm

Hi, indeed cells without ADT need to be their own batch. We apply an adversarial classifier on the batch ID to integrate those. This doesn’t work when using the same batch ID for cells with and without protein data. In an open PR, we have the option to add an adversarial key (but the assumption is still that one batch only contains one assay).

Topic		Replies	Views
TotalVI- ADT filtering and zero value markers scvi-tools totalvi	0	363	April 26, 2022
Running TOTALVI data in which subset of cells do not have citeseq data scvi-tools integration , totalvi	8	614	March 25, 2021
CITEsq loading in RNA and ADT data scanpy	4	845	July 6, 2023
totalVI workflow scvi-tools totalvi	12	682	August 1, 2021
Sparse matrix error using totalVI integration scvi-tools integration	1	401	August 4, 2023

TotalVI not fully integrating CITEseq cells with GEX and missing protein values

Related topics