TotalVI not fully integrating CITEseq cells with GEX and missing protein values

Hi Team,

Apologies if I’m doing something wrong here!

I have a CITEseq dataset in which some cells have good quality RNAseq values but poor quality ADT, so these ADT values have been removed. I tried using TotalVI to integrate (with batch effects independent of the missing ADT values) which worked brilliantly in every regard except that the cells with missing ADT values were clustered separately from the other cells with ADT present.

The tutorial shows successful integration of missing ADT data cells, however in this example all cells with missing ADT are part of one batch. I tested the tutorial again but removing ADT values from the same number of random cells:

#Remove ADT from random cells

random_indices = np.random.permutation(adata.shape[0])
random_indices = random_indices[range(0,adata.obsm["protein_expression"][batch == "PBMC5k"].shape[0])]
adata.obsm["protein_expression"].iloc[random_indices] = np.zeros_like(
    adata.obsm["protein_expression"].iloc[random_indices]
)
adata.obs['prot_data'] = [ True ]*adata.n_obs
adata.obs['prot_data'].iloc[random_indices] = False
# Rest of code pretty much the same...

perm_inds = np.random.permutation(len(adata))
sc.pl.umap(
    adata[perm_inds],
    color=[TOTALVI_CLUSTERS_KEY, "batch", "prot_data"],
    ncols=1,
    frameon=False,
)

Is this expected behaviour? I wouldn’t consider this to be integrated, but maybe it is unavoidable if the cells without ADT are not also treated as a batch needing correction? If the latter is true, how best would it be to approach this confounder?

Thanks for the great tool!
Tim

Hi, indeed cells without ADT need to be their own batch. We apply an adversarial classifier on the batch ID to integrate those. This doesn’t work when using the same batch ID for cells with and without protein data. In an open PR, we have the option to add an adversarial key (but the assumption is still that one batch only contains one assay).