Cell type annotation with new data

Hello all,

Would you please tell me how to annotate cell type with new data when using scanpy in previous steps? Thank you so much!

Hi @Chris

Cell type annotation always requires knowing your biological system, meaning you should know more or less which cell types you should expect in your data. The easiest thing you can do is to identify marker genes with sc.tl.rank_genes_groups, check them per cluster with sc.pl.rank_genes_groups_dotplot and manually annotate them.

In case you are not sure and want a first “draft” annotation, you can always use methods that leverage marker gene databases and automatically annotate your clusters. For example, here is a vignette using decoupler: Cell type annotation from marker genes — decoupler 1.2.1 documentation

In any case, the annotation of your clusters will always depend on the biological question you are trying to answer with your data, so if you use an automated method always check with someone that knows the system and can validate that the annotations are correct.

Hope this is helpful!

2 Likes

Thanks @PauBadiaM!

I know the tissue of the cells come from and ran sc.tl.rank_genes_groups. How scanpy know which cell type to include in the dot plot when using sc.pl.rank_genes_groups_dotpot()?

Previous to running sc.tl.rank_genes_groups you must have clustered your cells to identify characteristic cell populations, using sc.tl.leiden for example. Then, sc.tl.rank_genes_groups identifies which genes are specific (“markers”) for each of these clusters. Then sc.pl.rank_genes_groups_dotpot just plots the top N marker genes for each of them. So, to answer your question, it depends on the clusters that you defined previously. Note that it can be the case that more than one cluster belongs to the same cell type (they will show very similar marker genes), this usually happens for the most abundant cell types in your system.

1 Like

Sorry but I still have things not clear. So for example I got 10 clusters and identified 2 markers genes for each cluster. How can I know which cell type to put in the dotplot()?

You should be able to do it based on the genes that you observe. For example, if you see that a cluster has CD19 and PXK as marker genes, you could annotate it as B cells since these are known markers for B cells. If you are not sure about them, you can try to google them out one by one and come up with an annotation, or use automated approaches like decoupler, mentioned before.

1 Like

Could I use this data to do cell-type annotation?

Indeed! This is the database that decoupler uses to enrich cell types.

1 Like

In the effort of doing cell type annotation, I run:

sc.pl.umap(adata, color =[‘CD3E’], frameon = False, layer = ‘scvi_normalized’)


TypeError Traceback (most recent call last)
in
----> 1 sc.pl.umap(adata, color =[‘CD3E’], frameon = False, layer = ‘scvi_normalized’)

5 frames
/usr/local/lib/python3.7/dist-packages/matplotlib/colorbar.py in init(self, ax, mappable, **kw)
1228 they will need to be customized again. However, if the norm only
1229 changes values of vmin, vmax or cmap then the old formatter
→ 1230 and locator will be preserved.
1231 “”"
1232

TypeError: init() got an unexpected keyword argument ‘location’

Would you suggest a way to fix this? I appreciate it!

Hi moderators,

I follow along the tutorial here:

https://decoupler-py.readthedocs.io/en/latest/notebooks/cell_annotation.html

markers = dc.get_resource(‘PanglaoDB’)

markers


TypeError Traceback (most recent call last)
in
----> 1 markers = dc.get_resource(‘PanglaoDB’)
2 markers

4 frames
/usr/local/lib/python3.7/dist-packages/omnipath/_core/downloader/_downloader.py in init(self, opts)
52 allowed_methods=[“HEAD”, “GET”, “OPTIONS”],
53 status_forcelist=[413, 429, 500, 502, 503, 504],
—> 54 backoff_factor=1,
55 )
56 )

TypeError: init() got an unexpected keyword argument ‘allowed_methods’

I try to suggestion on the decoupler page:

rm /content/.cache/omnipathdb/*

rm: cannot remove ‘/content/.cache/omnipathdb/*’: No such file or directory

Would you please help? I appreciate it!

Hi @Chris

Could you install the latest versions of decoupler and omnipath and try again? Run this:

pip install git+https://github.com/saezlab/decoupler-py
pip install git+https://github.com/saezlab/omnipath

Let me know if it works! :wink:

1 Like

Two lines of your code cleared the bug above :smiley: Thank you!

1 Like

Other bug came :joy:

sc.pl.umap(acts, color=‘NK cells’)

TypeError Traceback (most recent call last)
in
----> 1 sc.pl.umap(acts, color=‘NK cells’)

5 frames
/usr/local/lib/python3.7/dist-packages/matplotlib/colorbar.py in init(self, ax, mappable, **kw)
1228 they will need to be customized again. However, if the norm only
1229 changes values of vmin, vmax or cmap then the old formatter
→ 1230 and locator will be preserved.
1231 “”"
1232

TypeError: init() got an unexpected keyword argument ‘location’

I got this error before with different data and still don’t know why :smiling_face_with_tear:
Would you please have a look?

It might be an incompatibility with a recent update of matplotlib, as described in this GitHub issue:

Try downgrading matplotlib to 3.5.2:

pip install matplotlib==3.5.2
1 Like

sc.pl.umap(acts, color=‘NK cells’)

Is that color = ‘NK cells’ because you know your sample has NK cells? If new tissue, not blood cells, how can I know the value to put in color? Thank you so much!

Hi @Chris

This is just an example on how to visualize a specific cell type of interest, in this case NK cells. What you can do is to identify which are the most represented cell types in your data with dc.summarize_acts (as shown in the notebook) and then plot them in case you want to manually check them.

1 Like