Working with CellRanger Aggregated Data

Hi Everyone,

I have a couple questions about the output of cellranger aggr. In particular, I want to know if it performs any sort of batch correction. As far as I can tell, it does not. Does this mean I can run batch correction algorithms on the output? My understanding is that aggr is basically a glorified sc.concat()… Is that right?

In my experiment, I have two time points(Day 3 and Day 7) with two conditions (Wildtype vs Gene KO) each, which comprise a developmental process from basal cells to terminally differentiated cells. Each Time Point/Condition was loaded into a seperate GEM well(4 wells). I have noticed that my early progenitor cell types from different conditions (WT or KO) are contained in different leiden clusters. My concern is whether or not this is real biology or a symptom of a batch effect. If cell ranger aggr doesn’t run batch effect correction, should I do so?

Another question: I noticed that aggr performs total read normalization, and so I have chosen to omit this step from the standard scanpy QC pipeline during subsequent analyses. Is this the correct way to go about it?

The 10x webiste says

By default, reads from each GEM well are subsampled such that all GEM wells have the same effective sequencing depth, measured in terms of reads that are confidently mapped to the transcriptome or assigned to the feature IDs per cell.

Thus, this normalizes the read depth per run, which is not mutually exclusive with scanpy QC. You will still want to filter outliers and normalize by UMI library size, etc. This is a result of there possibly being many reads per UMI.

This aggr command is not something that can be replicated with scanpy since it’s working at the read level.

1 Like