Clustering subsets of cells

Diogo_de_Moraes · October 15, 2021, 1:13am

Hello.

I have some datasets I would like to integrate, select a few cell types that interest me and recluster them. However, I think I might have a problem with the second time I select variable genes and train the model, because I’m not sure if getting the normalized data is adequate.

I ran this to normalize the expression, save these normalized genes, select variable genes, and cluster downstream.

adata.layers["counts"] = adata.X.copy()
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
adata.raw = adata  # keep full dimension safe
sc.pp.highly_variable_genes(
    adata,
    flavor="seurat",
    n_top_genes=3000,
    layer="counts",
    batch_key="Sample",
    subset=True
)

Then I selected the clusters that interested me to cluster them again, with

adata2=adata[adata.obs['leiden_0.6'].isin(['1', '5', '4'])]

Because I probably a need a new set of variable genes, I used the block below to get all genes back.

adata2 = adata2.raw.to_adata()

These genes are normalized

 print(adata2.X)
   (0, 18)	2.37902
   (0, 20)	2.37902
   (0, 47)	2.37902
   (0, 68)	2.37902
   (0, 84)	3.0247393

Finally, I ran this block but cluster these cells of interest again. I commented on the normalization step, as the genes are already normalized.

adata2.layers["counts"] = adata2.X.copy()
adata2.raw = adata2  # keep full dimension safe
#sc.pp.normalize_total(adata2, target_sum=1e4)
#sc.pp.log1p(adata2)
sc.pp.highly_variable_genes(
    adata2,
    flavor="seurat",
    n_top_genes=3000,
    layer="counts",
    batch_key="Sample",
    subset=True
)

There are 2 reasons I think something went wrong.
1 - all cells are too overlapped
2 - this warning

UserWarning: Make sure the registered X field in anndata contains unnormalized count data.

I assume the normalization should be performed with all cells present, which is why I decided to save normalized genes instead of counts. On the other hand, when I try to run this code but saving the raw counts instead by running

adata.raw = adata # keep full dimension safe

before the normalization, the cells are still too overlapped (they are not overlapped in the first clustering step).
Is there anything I am missing?

Best,
Diogo

adamgayoso · October 15, 2021, 5:16pm

A few things

you should use the seurat_v3 flavor for HVG selection, especially when giving it the count data.
If I understand correctly, you want to rerun scVI on a subset of your data. Have you tried just subclustering using Scanpy’s API? In many cases I would not expect the result to fundamentally change (subclustering on the full latent space compared with recomputing the model)

Diogo_de_Moraes · October 19, 2021, 10:31pm

Hi Adam

I will make sure to use seurat_v3 flavor, thank you
I assumed it would be adequate to use the same method for subclustering, and I have noticed many articles that looked for subpopulations of a cell type did something similar. Moreover, since those cells were integrated by scVI, won’t scanpy’s clustering keep the batch effect?
If the results won’t change, can I find potential subpopulations by simply increasing leiden resolution?

pseudonym2 · November 15, 2021, 1:27pm

If memory allows, the following should be possible (please correct me if I’m wrong):

after loading your adata with raw counts, make a copy of it
use your original adata to find clusters on level 0
add your level 0 annotations to the priorly saved copy of your original adata
subset according to your clusters, and you will have an object with raw counts that only contains a subcluster of your choice. You can then repeat the standard workflow.

Probably not a very elegant way though.

Topic		Replies	Views
Subsetting populations scanpy	1	717	March 16, 2023
Subset/subcluster and reprocess scRNA-seq	0	1039	August 3, 2022
Can’t change anndata dimensions anndata	6	2033	March 9, 2023
CellAssign keyword error: After Integration scvi-tools scvi	3	796	May 26, 2022
Filter genes in a subset of cells Help	0	347	August 8, 2022

Clustering subsets of cells

Related topics