Dear SCVI team,
Many thanks again for the brilliant suite of tools.
May I ask how TotalVI handles missing ADT data?
The scenario would be as follows:
- Gene and CITEseq counts were generated separately on cell ranger
- Cellranger filtered matrices for CITEseq may miss many good quality droplets from Gene expression data.
- To preserve good quality GEX cells, raw CITEseq matrices were merged, therefore may include droplets where CITEseq quality was poor/mostly ambient.
The questions are:
-
Would running TotalVI in this context be able to model that some CITEseq values in cells with poor quality CITEseq counts (but good quality GEX) are all background?
-
Would the better approach be to zero all the CITEseq for cells that did not pass ADT barcode filtering?
2a) If we zero a subset of cells where ADT did not pass filtering, will TotalVI consider ADT missing for these? or do we need to assign them as a separate batch?
- This is mainly referring to the line "when it’s all zeros, totalVI identifies that the protein data is missing in this “batch”. from this link (Reference mapping with scvi-tools - scvi-tools)
Many thanks in advance for any advice.
Best wishes