I have already completed the standard single-cell analysis workflow, but I found that one cluster consisted of low-quality cells. I want to remove this cluster, and an AI suggested that I should rerun the preprocessing steps because removing a cluster would affect the identification of highly variable genes and the computation of distances.
I also want to connect this with subclustering. Based on several tutorials, I plan to do the following:
raw_adata = adata.raw.to_adata() adata_subset = raw_adata[raw_adata.obs['leiden'] != 'CD4 T'].copy() sc.pl.highly_variable_genes(adata_subset) sc.pp.scale(adata_subset, max_value=10) sc.tl.pca(adata_subset) sce.pp.harmony_integrate(adata_subset, 'project') sc.pp.neighbors(adata_subset, n_neighbors=20, n_pcs=15, use_rep='X_pca_harmony') sc.tl.umap(adata_subset) sc.tl.leiden(adata_subset, resolution=0.2) sc.tl.rank_genes_groups(adata_subset, groupby="leiden_0.2", method="wilcoxon")The AI also mentioned that when running
rank_genes_groups, I should add theuse_rawparameter.Would this workflow correctly achieve the purpose of reanalyzing the data after removing the low-quality cell cluster?Should I proceed directly from identifying highly variable genes, or should I redo the normalization and log-transformation steps?