Hi Team,
Apologies if I’m doing something wrong here!
I have a CITEseq dataset in which some cells have good quality RNAseq values but poor quality ADT, so these ADT values have been removed. I tried using TotalVI to integrate (with batch effects independent of the missing ADT values) which worked brilliantly in every regard except that the cells with missing ADT values were clustered separately from the other cells with ADT present.
The tutorial shows successful integration of missing ADT data cells, however in this example all cells with missing ADT are part of one batch. I tested the tutorial again but removing ADT values from the same number of random cells:
#Remove ADT from random cells
random_indices = np.random.permutation(adata.shape[0])
random_indices = random_indices[range(0,adata.obsm["protein_expression"][batch == "PBMC5k"].shape[0])]
adata.obsm["protein_expression"].iloc[random_indices] = np.zeros_like(
adata.obsm["protein_expression"].iloc[random_indices]
)
adata.obs['prot_data'] = [ True ]*adata.n_obs
adata.obs['prot_data'].iloc[random_indices] = False
# Rest of code pretty much the same...
perm_inds = np.random.permutation(len(adata))
sc.pl.umap(
adata[perm_inds],
color=[TOTALVI_CLUSTERS_KEY, "batch", "prot_data"],
ncols=1,
frameon=False,
)
Is this expected behaviour? I wouldn’t consider this to be integrated, but maybe it is unavoidable if the cells without ADT are not also treated as a batch needing correction? If the latter is true, how best would it be to approach this confounder?
Thanks for the great tool!
Tim