Hello.
I have some datasets I would like to integrate, select a few cell types that interest me and recluster them. However, I think I might have a problem with the second time I select variable genes and train the model, because I’m not sure if getting the normalized data is adequate.
I ran this to normalize the expression, save these normalized genes, select variable genes, and cluster downstream.
adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
adata.raw = adata # keep full dimension safe
sc.pp.highly_variable_genes(
adata,
flavor="seurat",
n_top_genes=3000,
layer="counts",
batch_key="Sample",
subset=True
)
Then I selected the clusters that interested me to cluster them again, with
adata2=adata[adata.obs['leiden_0.6'].isin(['1', '5', '4'])]
Because I probably a need a new set of variable genes, I used the block below to get all genes back.
adata2 = adata2.raw.to_adata()
These genes are normalized
print(adata2.X)
(0, 18) 2.37902
(0, 20) 2.37902
(0, 47) 2.37902
(0, 68) 2.37902
(0, 84) 3.0247393
Finally, I ran this block but cluster these cells of interest again. I commented on the normalization step, as the genes are already normalized.
adata2.layers["counts"] = adata2.X.copy()
adata2.raw = adata2 # keep full dimension safe
#sc.pp.normalize_total(adata2, target_sum=1e4)
#sc.pp.log1p(adata2)
sc.pp.highly_variable_genes(
adata2,
flavor="seurat",
n_top_genes=3000,
layer="counts",
batch_key="Sample",
subset=True
)
There are 2 reasons I think something went wrong.
1 - all cells are too overlapped
2 - this warning
UserWarning: Make sure the registered X field in anndata contains unnormalized count data.
I assume the normalization should be performed with all cells present, which is why I decided to save normalized genes instead of counts. On the other hand, when I try to run this code but saving the raw counts instead by running
adata.raw = adata # keep full dimension safe
before the normalization, the cells are still too overlapped (they are not overlapped in the first clustering step).
Is there anything I am missing?
Best,
Diogo